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Abstract 



Contemporary linguistic formalisms have become so rigorous that it is now possible to view 
them as very high level declarative programming languages. Consequently, grammars for natural 
languages can be viewed as programs; this view enables the application of various methods and 
techniques that were proved useful for programming languages to the study of natural languages. 

One of the most successful implementation techniques for logic programming languages involves 
the use of an abstract machine. In this approach one defines an abstract machine with the fol- 
lowing properties: it is close enough to the high-level language, thus allowing efficient compilation 
to the abstract machine language; and it is sufficiently low-level to allow efficient interpretation 
of the machine instructions on a variety of host architectures. Abstract machines were used for 
processing procedural and functional languages, but they gained much popularity for logic pro- 
gramming languages since the introduction of the Warren Abstract Machine (WAM) . Most current 
implementations of Prolog, as well as other logic languages, are based on abstract machines. The 
incorporation of such techniques usually leads to very efficient compilers in terms of both space 
and time requirements. 

In this work we have designed and implemented an abstract machine, AA4ACTA, for the 
linguistic formalism ALE, which is based on typed feature structures. This formalism is one of 
the most widely accepted in computational linguistics and has been used for designing grammars 
in various linguistic theories, most notably HPSG. AMACTA is composed of data structures and 
a set of instructions, augmented by a compiler from the grammatical formalism to the abstract 
instructions, and a (portable) interpreter of the abstract instructions. The effect of each instruction 
is defined using a low-level language that can be executed on ordinary hardware. 

The advantages of the abstract machine approach are twofold. From a theoretical point of view, 
the abstract machine gives a well-defined operational semantics to the grammatical formalism. This 
ensures that grammars specified using our system are endowed with well defined meaning. It en- 
ables, for example, to formally verify the correctness of a compiler for HPSG, given an independent 
definition. From a practical point of view, AAAACXA is the first system that employs a direct 
compilation scheme for unification grammars that are based on typed feature structures. The use 
of AMALIA results in a much improved performance over existing systems. 

In order to test the machine on a realistic application, we have developed a small-scale, HPSG- 
based grammar for a fragment of the Hebrew language, using AMACIA as the development 
platform. This is the first application of HPSG to a Semitic language. 
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Chapter 1 

Introduction 



1.1 Motivation 



Research in linguistics has traditionally been aimed at describing the structure of Natural Lan- 
guages (NLs). Since the 1950s, however, the focus has shifted from attempts to provide such 
descriptions to the definition of the right way in which to stipulate them. During the past few 
decades many such formalisms were devised. A 'good' model, according to Shieber (1986| ), is 
linguistically felicitous, expressive and computationally effective. It must be powerful enough to 
capture the wealth and diversity of NLs, yet it must be computationally tractable to allow for 
computational processing. 

Contemporary linguistic formalisms such as LFG ( Kaplan and Brcsnan, 198^ ) or HPSG (Pollard 
and Sag, 1994) have become so rigorous that it is now possible to view them as very high level 
declarative programming languages. In this metaphor a grammar for a natural language, formally 
specified using one of the modern frameworks described above, can be viewed as a program. The 
execution of a grammar on an input sentence yields an output which represents the sentence's 
structure. This view enables the application of various methods and techniques that were proved 
useful for programming languages to the study of natural languages. 

Historically, many computational fields of research originated from the study of natural lan- 
guages: important aspects the theory of formal languages are due to Chomsky, for example; and 
more recently, Prolog originated out of an attempt to provide a language for description of natural 
languages. Today, however, much progress was achieved in the area of programming languages. 
Tools and techniques were developed that enable efficient processing of such languages and, more 
importantly, formal propositions to be made and proved over languages in general and specific 
programs in particular. These advances are now being incorporated into the realm of natural 
languages. Grammars for natural languages are specified more precisely; their properties can be 
mathematically stated; and their processing becomes more efficient. For a survey of some such 
approaches, see (Shieber, 1986); for examples of the advantages of regarding natural language 
formalisms as programming languages, see (Barton, Berwick, and Ristad, 1987; Manaster-Ramer, 
T987| ). 

This work introduces such an application: an implementation technique that is common for 
logic programming languages, namely the use of an abstract machine, is applied to (a subset of) 
the ALE formalism (Carpenter, 1992a), originally designed for specifying feature-structure based 
phrase-structure grammars. Abstract machines were used for processing procedural and functional 
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languages, but they gained much popularity for logic programming languages since the introduction 
of the Warren Abstract Machine (WAM - see ( [Warren, 1983| ) and a tutorial in ( |Ait-Kaci, 1991D ). 
Most current implementations of Prolog, as well as other logic languages, are based on abstract 
machines. The incorporation of such techniques usually leads to very efhcient compilers in terms 
of both space and time requirements. 

AMACTA is an abstract machine, specifically tailored for processing ALE grammars. It is 
composed of data structures and a set of instructions, augmented by a compiler from the grammat- 
ical formalism to the abstract instructions, and a (portable) interpreter of the abstract instructions. 
The effect of each instruction is defined using a low-level language that can be executed on ordi- 
nary hardware. The advantages of the abstract machine approach are twofold. From a theoretical 
point of view, the abstract machine gives a well-defined operational semantics to the grammatical 
formalism. This ensures that grammars specified using our system are endowed with well defined 
meaning. It enables, for example, to formally verify the correctness of a compiler for HPSG, given 
an independent definition. From a practical point of view, AA4ACIA is the first system that 
employs a direct compilation scheme for unification grammars that are based on typed feature 
structures. The use of AA4ACXA results in a much improved performance over existing systems 
(in particular, ALE itself). 



1.2 Literature Survey 



1.2.1 Grammatical Formalisms 

Much of the recent research in computational linguistics has been directed towards defining a good 
model in which natural languages would be naturally describable. Having its roots in the study 
of formal languages, this endeavor started with considering the computational power needed for 
describing natural languages in general; context free grammars were thus ruled out quite early. 
But even if one did believe that natural language were, indeed, within the scope of context free 
languages, one had to admit that context free grammars were not the ideal framework in which to 
develop grammars for the natural languages. It was understood that the weak generation properties 
of a grammar in a given formalism (i.e., its ability to recognize all and only the sentences of a 
language) are not sufficient - there is a need in providing syntactic descriptions that cohere with 
the way linguists capture the language. 

The resulting trend in computational linguistics was to use unification-based formalisms to 
obtain these two goals. While many such frameworks were developed (see ( Shicber, 1986D for a 
good review), some notions are common to most of them. They are all based on a context free 
skeleton, where non-terminal symbols are replaced with structured, more complex entities; and 
the basic operation on these structures is unification. Among these frameworks are Functional 



Unification Grammar (Kay, 1983), Lexi cal Functional Gra mmar (Kaplan and Bresnan, 1982) 



Generalized Phrase-Structure Grammar ( Gazdar et al., 1985 ) and many others 

A unification-based grammar formalism is a meta-language for describing grammars for (nat- 
ural) languages. The basic entity of such formalisms is the feature structure ~ a data structure 
consisting of a set of feature-value pairs. While different frameworks define feature structures 
differently, they can in general be captured as directed graphs, where the arcs are labeled with 
feature names and an /-labeled arc connects nodes v and u if and only if the value of the feature 
/ in the feature structure associated with v is the feature structur e associated wi th u. For a good, 
informal survey of feature structures and their properties refer to Shieber (1986 ) 



Feature structures can be thought of as an extension of first order terms (see ( Carpenter, 1991 
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Ait-Kaci and Podclski, 1993| )), where the sub-terms are coded by feature names rather than by 



positions. They extend first order terms in that they are in general graphs, whereas first order 
terms are trees, with possibly shared leaves. Hence, a term might be a common part of more than 
one sub-term. Some grammatical formalisms decorate feature structures with types, or sorts, that 
can be captured by labels on the nodes of the graph. Feature structures are used by grammatical 
formalisms to represent linguistic concepts such as words, phrases and sometimes even grammatical 
rules. 

The basic operation on feature structures is unification. Being very much like first-order term 
unification, this operation combines the information that is encoded by two feature structures and 
produces a result that contains the unified information, provided that the two arguments don't 
contain contradicting information. If the arguments are inconsistent, unification is said to fail. 

In this work we are mainly concerned with typed feature structures (TFSs), as described in (Car- 
penter, 19921: ). As their name suggests, each such structure has a type, drawn from a pre-defined, 
partially ordered set of types. The type hierarchy helps the grammar writer to organize linguis- 
tic knowledge in a similar way to common knowledge representation languages. The hierarchy is 
accompanied by an appropriateness specification that associates features with types; for example, 
the 'case' feature might be defined to be appropriate for feature structures of type 'noun' but not 
for structures of type 'verb'. Moreover, appropriateness is inherited: if a feature is appropriate for 
a type t, then it is appropriate for all the sub-types of t as well. This property is reminiscent of 
object-oriented systems; in particular, multiple inheritance is supported in this framework. 

It is important to note that while some of the above-mentioned formalisms were designed as 
computational frameworks for developing grammars, others were linguistically oriented in the sense 
that a grammatical theory was encoded within them in one way or another. Obviously, any such 
formalism defines at least the expressive power of grammars that can be stipulated within it. But 
many other linguistic considerations and generalizations can be, and actually are, hard-wired into 
some formalisms. 



1.2.2 The Current Role of HPSG 



Recently HPSG has become prominent among the various unification-based formalisms. HPSG 
(Pollard and Sag, 1987; Pollard and Sag, 1994) was developed by Pollard and Sag as a variant of 
GPSG and Categorial Grammar, but immediately gained a position of a well-founded, promising 
formalism for the description of natural languages. It incorporates the notion of typed feature 
structures, where types are partially ordered according to a defined hierarchy, thus enabling very 
concise, general rules to be stipulated. Much of the information carried by linguistic entities is 
stored in the lexicon; as a result, grammar rules become few and very general. HPSG defines a 
set of linguistically plausible schemas, or universal principles, that are said to hold for all natural 
languages and are part of every grammar. In addition, language specific rules can be specified in 
any given grammar. 

Due to its generality and elegance, HPSG has gained a lot of popularity. It enables the designing 
of grammars for variou s, linguistically different, languages: wo rk has been done on HPSG grammars 
for English, German (Ncrbonnc, Nettcr, and Pollard, 1994 ), French, Japanese ( JPSG Working 



Group, In Preparation), K orean and many other languages ( see a bibliog raphy in ( |Calcagno 
Kathol, and Pollard, 1993 ) and an electronic bibliography in ( Miiller, 1996|) ). HPSG principles 



were used to describe not only the syntax and semantics of languages, but also their morphology 
(e.g., dNerbonne, 1992| )) and phonology (e.g., ( [Bird, 1990| ; |Bird, 1992| )). It seems that linguists find 
this kind of typed-feature-structures based formalism, with lexical rules and a small set of very 
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general grammatical rules, very convenient. 

In spite of the interest that HPSG invokes, no formal definition of the formalism exists. Both 



(Pollard and Sag, 1987) and (Pollard and Sag, 1994) are rather linguistically oriented, in the sense 
that no mathematical definitions are given for the language of HPSG itself. King (1989; 1992) gives 
a logical formalization of Pollard and Sag (1987| ); it is an attempt to provide a logical framework 
within which both the elementary entities of HPSG, such as feature structures and types, and the 



principles and the rules, can be described. While King (1989) encompasses Pollard and Sag (1987) 
in its entirety, it does not provide a characterization of all pos sible HPSG grammars, nor does it 
describe the current formu lation of the theory as exp ressed in Pollard and Sag (1994 ). A similar 
drawback can be found in ( Pollard and Moshier, 199C ) : while it gives a denotational semantics for 
a typed feature structures system, it is not directed specifically towards HPSG, and no formulation 
of the properties of HPSG grammars is given. 



A different work is described in ( Carpenter, 1992b ); a wide, concise the ory of the logic o f 
typed feature structures is presented, with many variations and applications. Carpenter (1992b ) 
serves as the main reference point for any attempt to define such formalisms; however, as it is not 
concentrated on HPSG per se, no formal definition for it can be found there either. 

Not only a denotational semantics for HPSG is required; operational semantics of the formalism 
i s missing, too. While s ome c ompilers for HPSG wer e developed (see section 1.2.4 ), they all rely on 
(Pollard and Sag, 1987) and (Pollard and Sag, 1994) as their source for interpreting the formalism, 
and as we mentioned above, both references are not formal enough. Since HPSG is not defined 
formally enough, we opted in this work to implement ALE (see below), which is the most common 
platform for designing HPSG grammars. 



1.2.3 Abstract Machine Techniques 

High-level programming languages, especially ones with dynamic structures, have always been 
hard to develop compilers for. A common technique for overcoming the problems involves the 
notion of an abstract machine. It is a machine that, on one hand, captures the essentials of the 
high-level language in its architecture and its instruction set, such that a compiler from the source 
language to the (abstract) machine language becomes relatively simple to design. On the other 
hand, the architecture must be simple enough for the machine language to be easily interpretable 
by common, Von-Neumann machine languages. This attitude also enables the design of portable 
front ends for the compilers: as the machine language is abstract, it can be easily interpreted by 
different (concrete) machine languages. 

The design of such an abstract architecture must be careful enough to compromise the two, 
usually conflicting, requirements: the closer the machine architecture is to common architectures, 
the harder it is to develop compilers for it; and on the other hand, if such a machine is too 
complex, then while a compiler for it is easier to produce, it becomes more complicated to execute 
its language on normal architectures. 

Abstract machines were used for vari ous kinds of la nguages: they date back to the P-Code for 
Pascal. Starting from Landin's SECD ( Landin, 1964 ), many compilers for functional languages 
were designed this way. When logic programming languages appeared, such techniques were applied 
to them as well. While Prolog has gained a recognition as a practical implementation of the idea of 
programming in logic, a method for interpreting the declarative logical statements was needed for 
such an implementation to be well-founded. In 1983 David Warren designed an abstract machine 



for the execution of Prolog, consisting of a memory architecture and a set of instructions (Warren, 
1983; Ai't-Kaci, 1991). Even though there were prior attempts to construct both interpreters and 
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compilers for Prolog, it was the Warren Abstract Machine (WAM) that gave Prolog not only a 
good, efficient compiler, but, perhaps more importantly, an elegant operational semantics. 

The WAM consists of an architecture of the machine, augmented by a compiler from Prolog 
to the instruction set of the abstract machine. The operational semantics of each instruction is 
defined using a low-level language that can be trivially mapped to any ordinary hardware. In fact, 
there is even a formal verification of the correctness of the WAM compiler ( Russinoff, 1992| ). The 
WAM captures in an elegant way the substantial elements of Prolog. First-order term unification 
is supported by special data structures and instructions of the machine architecture. Several 
instructions that deal with control issues implement the backtracking mechanism. 

The WAM immediately became the starting point for many compiler designs for Prolog. The 
techniques it delineates serve not only for Prolog proper, but also for constructing compilers for 
related languages. To list just a few examples, abstract machine techniques were used for a parallel 
Prolog compiler (Hermenegildo, 1986| ), for variants of P rolog that use different res olution methods 
( Swift and Warren, 1993), extend P rolog with types ( Beierle and Meyer, 1994) or with record 



structures (Smolka and Treinen, 1994), and for a general theorem prover ( Schumann, 1991 ). There 



have even been attempts to construct a methodology for the design of abstract machines for logic 
programming languages ( Kursawe, 1987 ; Nilsson, 1993 ) . 



1.2.4 Processing HPSG 

Linguistic formalisms provide means for describing the structure of natural languages; they do not 
specify methods for determining whether a given string is indeed a member of the language defined 
by a grammar; nor do they prescribe ways for computing the structure that the grammar assigns 
to the permissible strings. These tasks are performed by parsing algorithms. Different parsing 
algorithms exist for various cl asses of languages, both formal (see (Aho and UUman, 1972|) for a 
survey) and natural (see, e.g., ( Gazdarand Mellish, 198E ; Sikkel, 1993 ; Pereira and Warren, 1983|) ). 
In this work we implement a simple chart parsing algorithm; such parsers were first introduced 
by ( Kaplan, 1973 ; Kay, 1973 ) and are widely used nowadays. 

Various parsers for HPSG have been designed in the past, some of which compile their input 
grammars into an executable program. The first work is described in (Prudian and Pollard, 1985); 
it is an implementation of a very early version of HPSG. For instance, most of the features are 
limited to accommodate only a small set of atomic values. Rules are specified in a way reminiscent 
of GPSG rules. This work cannot be considered as reflecting HPSG today. 

Franz has implemented an HPSG parser in LISP (Franz, 1990). This parser was designed in 



accord with Pollard and Sag (1987), and doesn't cover the modifications introduced by Pollard 



and Sag (1994). It is rather limited, for example by allowing only tree-shaped type hierarchies to 
be defined - no multiple inheritance is permitted. While a specific HPSG grammar for English 
is a part of this implementation, the system can be used as a framework for developing different 
grammars. According to Franz's reports, the parser is very slow, even when used on a limited 

grammar and short inputs: example sentence s were parsed in 12-65 seconds. 

A differe nt implementation is HPSG-PL ( Popowich and Vogel, 1991 ; Kodric, Popowich, and| 
Vogel, 1992 ). This system allows more complex type hierarchies to be defined; it enables the 
definition of gramm ar rules, principles and lexical rules, and an HPSG grammar for English is 
supplied, based on Pollard and Sag (1987 ). It incorporates a chart parser where the parsing 



algorithm makes specific use of some grammar features (e.g., HEAD-DTR), and thus the stipulation 
of rules does not involve explicit phrase structure. The grammar is compiled into a Prolog program 
where each feature structure is transformed to a fixed-arity list. Yet the performance of the parser 
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is rather low: according to (Popowich and Vogel, 1991), simple sentences take 1-25 seconds of CPU 
time to parse. 



Another system that was adapted for HPSG is Unicorn (Gerdemann and Hinrichs, 1988). 
Originated as a generalization of a Context-Free parser, this system uses Shieber's extension to 
Barley's algorithm, thus enabling the definition of various augmented context-free grammars. An 



HPSG grammar was defined in terms of the Unicorn framework, with some divergence from Pollard 
and Sag (1994), by Russell (1993). The most important aspect in which Unicorn differs from the 
above mentioned parsers is that it doesn't incorporate a typing system for feature structures at all. 
This system was not specifically designed with HPSG implementation in mind, and the grammar 
was not intended to be complete in any sense, as it was used only as part of a more complex 
project. We have no reports on the results of this parser for HPSG. 

It is important to note that HPSG falls naturally into the class of general constraint systems, 
and thus the problem of providing the correct structure for an input sentence can be naturally 
reduced to the problem of solving a constraint system. Many general constraint solvers have been 
developed recently that were used for linguistic applications, including some for which HPSG gram- 
mars were designed. A typical representative is ALE ( [Carpenter, 1992a ). Not being specifically 
designed for HPSG, this system is a general Attribute Logic Engine incorporating a chart parser 
with a formalism for specifying relations among typed feature structures in a way that enables 
encoding of HPSG grammars in a very natural manner. In fact, an HPSG grammar for English 
has been constructed in this framework by Penn (1993) that covers most of Pollard and Sag (1994). 
Compilation of ALE programs generates a rather efficient Prolog code. 

A very similar project is Troll (Gerdemann, 1993). It is a framework for processing typed 
feature structures, much in the same way as ALE does, albeit with a slightly different underlying 
theory. As Troll is still in preliminary phases, not much is reported regarding its use. Another 
work, aiming at covering as many as possible of the extensions to simple unification formalisms, 
is CUE (Dorre and Dorna, 1993). This system is still under development. Two more general 
systems that were used for developing HPSG grammars are TES ( |Zajac, 1992| ), which is a general 
constraint solver, and PROEIT (Erbach, 1994), which simply compiles TES based specifications 
to Prolog. 



1.2.5 Computational Grammars for Hebrew 

The Hebrew language poses some interesting problems for the grammar designer. The Hebrew 
script^ is highly ambiguous, a fact that results in many part-of-speech tags for almost every 
word (Ornan, 1994). Another problem of the script is that short prepositions, articles and con- 
junctions are usually attached to the words that immediately succeed them, which makes it harder 
to parse the input sentences. In addition to these two features, the Hebrew morphology is very 
rich. A noun base form might have over fifteen different derivations, and a verb base form - more 
than thirty. All these call for some pre-processing of the input to the parser. Disambiguation of 
the script, a s well as morphological analys i s, were covered by diffe rent works ( Bentur, Angel, and] 



Segev, 1992| ; phoueka and Ne'eman, 1995| ; prnan and Katz, 1995| ); some major decisions have to 



be taken, including the representation of the Hebrew script and the treatment of morphological 
analysis. As the current trend is to use constraint-based formalisms for tasks other than syntax 
and semantics, this is the approach we choose. 



^We refer here to the non-vocalized script which is in everyday use, and not to the vocahzed script that is used 
for special purposes (such as poems or children books) only. 
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At the syntactic level, Hebrew exhibits a rather free constituent order, although many con- 
straints are placed on the order of words within constituents. The use of agreement features in 
Hebrew is more extensive than, say, English. For example, nouns and adjectives must agree on 
number, gender and definiteness. Agreement checking becomes more complicated in coordinated 
constructs (see (Wintner, 1991)) and much thought must be given to the correct treatment of 
agreement. 

There have been some attempts to provide a computational grammar for (the syntax of) the 
Hebrew language. The first work was done by Cohen (1984), who had written a special software 
system for performing both the morphological and the syntactic analysis of Hebrew sentences. This 
work was very preliminary and it s coverage was limited. Another preliminary work is described in 
( Nirenburg and Ben-Ashcr, 1984 ): it is a small-scale ATN for Hebrew , capable of recognizing very 
limited structures. A transformation-based grammar is suggested in (Chayen and Dror. 1976). 

Unification-based formalisms were used for developing Hebrew grammars only recently. A very 
limited experiment was done using PATR-II ( Wintner, 1992 ) but was later extended ( Wintnei 
and Ornan, 1991 ; Wintner and Ornan, 199^ ) to a reasonable subset of the language, on a more 
convenient platform: Tomita's LR Parser/Compiler, which is based on LFG. The grammar is 
capable of recognizing sentences of rather wide variety and complexity, but produces only the 
syntactic structures of the input sen tences. See ( Wintner, 1991 ) for a detailed discussion. A 
different work along the same lines is ( Yizhar, 1993| ): it uses the same framework but concentrates 
on the syntax of NPs in Hebrew, employing ideas from different linguistic theories. All in all, no 
broad-coverage, efficient, concise computational grammar for Hebrew exists. 



1.3 Achievements of the Thesis 

The main objective of this work was to formally define an operational semantics for a unification- 
based grammar formalism, suitable for specifying HPSG grammars, through the use of an abstract 
machine. To this end we have first conducted a theoretical investigation into the properties of such 
formalisms. The main contributions of this endeavor are: 

• Formalization and explication of the notion of multi-rooted feature structures (MRSs) that 
are used implicitly in the computational linguistics literature; 

• Concise definitions of a TFS-based linguistic formalism, based on abstract MRSs; 

• Algebraic specification of a parsing step operator, Tq^w, that induces algebraic semantics for 
this formalism; 



• Treatment of parsing as a model for computation, assigning operational semantics to the 
linguistic formalism; 

• Specification and correctness proofs for parsing in this framework; 

• A new definition for off-line parsability, less strict than the existing one, and termination 
proof for off-line parsable grammars. 



This more theoretical work was presented as (Wintner and Francez, 1995b). The off-line parsability 
result is presented in (Wintner and Francez, to appear). 

Once the theoretical background was set, we have designed AA4ACTA, an abstract machine 
for unification-based grammars. This is the first application of abstract machine techniques to 
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a linguistic formalism. The core engine of the machine was presented in (Wintner and Francez 



1995a), and a more detailed presentation is in preparation. The machine is accompanied by 



a compiler from the ALE specification language to the machine instructions, an interpreter for 
the machine instructions and a debugger for machine language programs. The abstract machine 
endows natural language grammars with an operational meaning; furthermore, its use results in 
highly efhcient processing: the compiled grammars are executed much faster than with the existing 
ALE processor. Some tests we have conducted showed a speed-up of a factor of 20 in compilation 
time, and a factor of 5-15 in execution time. 

In order to test the machine on a realistic application, we have developed a small-scale, HPSG- 
based grammar for a fragment of the Hebrew language, using AM ALIA as the development 
platform. This is the first application of HPSG to a Semitic language. 

Another track of research we are exploring (with Evgeniy Gabrilovich) is the adaptation of 
AMACIA to perform natural language generation, as opposed to parsing. Based upon the al- 
gorithm of (Samuelsson, 1995), a characterization of generation with unification based grammars 
can be found in ( [Gabrilovich, In preparation ). A separate compiler is constructed, based upon 
AAAACTA^s compiler, that transforms a grammar to an inverted, normalized form, more suited 
for generation. To execute the inverted grammar on AM.ACTA, very few modifications in the 
machine are needed. Once this project is completed, AM.ACIA will become a unified framework 
for processing grammars, supporting both parsing and generation. 



1.4 Structure of this Document 



In chapter g a theory of parsing with typed feature structures is presented. We start in a survey 
of the theory of TFSs, along the lines of Carpenter (1992b), but we extend it to multi-rooted 

In particular, we discuss the computational properties of TFS-based 
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structures in section 

grammars and show in section 2.10.4 a condition on grammars that guarantees termination of 
parsing. 

Chapter ^ describes the abstract machine itself, starting with its core, aimed at unifying two fea- 
ture structures. In section 3.3 this engine is enveloped with control structures to accommodat e fo r 
parsing. We conclude this chapter with a discussion of some implementation details (section . 

Chapter || describes the HPSG-based grammar for Hebrew. Conclusions and suggestions for 
further research are given in chapter 0. Appendix ^ lists all the machine instructions, and the 
Hebrew grammar is listed in appendix pi 
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Chapter 2 



Parsing with Typed Feature 
Structures 



This chapter provides the theoretical background for the design of the machine: we discuss below 
the theory of typed feature structures and the details of parsing with grammars that are based upon 



them. Section 2.1 outlines the theory of TFSs of (Carpenter, 1991; Carpenter, 1992b). We repeat it 
here in order to make this document as self contained as possible. However, the well-foundedness 
result (section |2.3D is an original contribution. We deviate from the presentation of parpcnter 
(1992b ) in section 2.4, where we emphasize abstract typed feature structures (AFSs). Encoding 
the essential information of TFSs, AFSs were introduced by Moshier (1988D but we use a different 
presentation that is suited for typed feature structures. Unification is defined over AFSs rather 



than TFSs. Section 2.7 introduces an explicit construct of multi-rooted feature structures (MRSs) 



that naturally extend TFSs, used to represent phrasal signs as well as grammar rules. Abstraction 
is extended to MRSs and the mathematical foundations needed for manipulating them is given. 
The concepts of grammars and the languages they generate are formally defined in sec tion 2.8, 
and the TFS-based formalism is thus acquired a denotational semantics. In section 2^ a model 
for computation, corresponding to bottom-up chart parsing for the formalism, is presented. The 
TFS-based formalism is thus endowed with an operational semantics. Next, we prove that both se- 
mantics coincide. Finally, wc discuss the class of grammars for which computations terminate. We 
give a more relaxed definition for off-line parsability and prove that termination is guaranteed for 
off-line parsable grammars. The presentation is accompanied by a running example of a grammar 
and the parsing process it induces. 



2.1 Theory of Feature Structures 



The first part of this section summarizes some preliminary notions along the lines of ( [Carpenter 



1992b). For the following discussion we fix non-empty, finite, disjoint sets Types and Feats of 
types and feature names, respectively. We assume that the set Feats is totally ordered. We 
also fix an infinite set Nodes of nodes, disjoint of Types and Feats, each member of which is 
decorated by a type from Types through a fixed typing function 9 : Nodes Types. The set 
Nodes is 'rich' in the sense that for every t e Types, the set {q e Nodes | 0{q) = t} is infinite. 
Below, the metavariable T ranges over subsets of types, t - over types, / - over features and q 
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- over nodes. When dealing with partial functions the notation 'F{x) |' means that F is defined 
for the value x and the symbol 'f means undefinedness. Whenever the result of an application of 
a partial function is used as an operand, it is meant that the function is defined for its arguments. 

Definition 2.1.1 (Type iiierarchy) A partial order □ over Types x Types is a type liierar- 

cliy (or iniieritance iiierarciiyj if it is bounded complete, i.e., if every up-bounded subset T of 
Types has a (unique) least upper bound, UT, referred to as the unification of the types in T . 

If ti ^ ^2 we say that ti subsumes, or is more general tiian, ^2/ ^2 is a subtype of (more 
specific than) ti. 

Let ± = U0 &e the most general type. Let the most specific type be T = UTypes. If UT = T 
we say that T is inconsistent. Let FIT — U{t' \ t' Q t for every t £ T} be the greatest lower bound 
of the set T. 



Definition 2.1.2 (Feature structures) A (typed) feature structure (TPS) is a directed, 
connected, labeled graph consisting of a finite, nonempty set of nodes Q C NODES, a root q G Q, 
and a partial function 6 : Q x Feats — > Q specifying the arcs such that every node q G Q is 
accessible from q. 

The nodes of a feature structure are thus labeled by types while the arcs are labeled by feature 
names. The root g is a distinguished node from which all other nodes are reachable. A feature 
structure is of type t when 9{q) = t. When we say that a feature structure A exists we mean that 
no node of A is typed T. 

Let FS be the collection of all feature structures over the given Feats and Types. We use 
upper-case letters (with or without tags, subscripts etc.) to refer to feature structures. We use 
Q,q,S (with the same tags or subscripts) to refer to constituents of feature structures. Figure ^ 
depicts an example feature structure, represented both as an Attribute- Value Matrix (AVM) and 
as a graph. 



Graph representation: 





J phrase] 








syn/ 


SUBJ \ 








( s \ 


f head] 


AGR 


AGR 


[ head ) 



AVM representation: 



phrase 

SYN : 

SUBJ: 
HEAD 



head 

AGR : 
head 

AGR : 



1 



agr 



Figure 2.1: A feature structure 

Note that all feature structures are, by definition, graphs. Some grammatical formalisms used 
to have a special kind of feature structures, namely atoms; atoms are represented in our framework 
as nodes with no outgoing edges. For a discussion regarding the implications of such an approach, 
refer to ( Carpenter, 1992b , Chapter 8). 
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Definition 2.1.3 (Appropriateness) An appropriateness specification over the type hier- 
archy and the set FEATS is a partial function Approp : FEATS x TYPES TYPES, such that: 

• let Tf = {te Types | Approp{f,t)i}; then for every f e Feats, T/ 7^ and HTf e Tf. 

• if Approp{f,ti)i andti Qt2 then Approp{f ,t2)[ and Approp{f ,t{) ^ Approp{f ,t2) ■ 

i.e., every feature is introduced by some most general type, and is appropriate for all its subtypes; 
and if the appropriate type for a feature in ti is some type t, then the appropriate type of the same 
feature in ^2, which is a subtype of ti, must be at least as specific as t. 

If Approp{f, t) [ we say that / is appropriate for t and that Approp{f, t) is the appropriate 
type for the feature f in the type t. The set of features appropriate for some type is ordered (since 
Feats is ordered). 

Definition 2.1.4 (Well-typed feature structures) A feature structure {Q,q,S) is well typed 

iff for every q £ Q,9iq) ^ T and for all f e Feats and q e Q, if S{q,f)l then Approp{f,d{q))l 
and Approp{f, e{q)) C e{S{q, /)). 

i.e., if an arc labeled / connects two nodes, then / is appropriate for the type of the source node; 
and the appropriate type for / in the type of the source node subsumes the type of target node. 

Definition 2.1.5 (Total well-typedness) A feature structure is totally well- typed iff it is 

well typed and for all f € Feats and q gQ, if Approp{f,9{q))i then S{q,f)i. 

i.e., every feature which is appropriate for the type labeling some node labels an outgoing arc from 
that node. 

Definition 2.1.6 (Appropriateness loops) The appropriateness specification contains a loop 
if there exist ti,t2, ■ ■ ■ ,tn G Types such that for every i, 1 < i < n, there is a feature fi G Feats 
such that Approp{fi,ti) = ti+i, where t„+i = ti. 

Definition 2.1.7 (Paths) A path is a finite sequence of feature names, and the set PATHS = 
Feats* is the collection of paths. We use tt, a (with or without subscripts) to refer to paths, e is 
the empty path. The definition of S is extended to paths in the natural way: 

5{q,€) = q 

6{q,f7r)=6{6{q,f),n) 
The paths of a feature structure A are ll{A) = {tt \ tt G Paths and S{qA, tt)!}. 

Definition 2.1.8 (Cycles) A feature structure A = {Q, q, 5) is cyclic if there exist a non-empty 
path a G Paths and a node q G Q such that 5{q, a) = q. It is acyclic otherwise. 

Definition 2.1.9 (Path values) The value of a path tt in a feature structure A = {Q,q,S), 
denoted by val{A,7T), is non-trivial if and only if S{q,TT)i, in which case it is a feature structure 
A' = {Q',q',5'), where: 

• q' = S{q,TT) 
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• Q' = {q' I there exists a path w' such that 5{q' ,1^') = q'} (Q' is the set of nodes reachable 
from q' ) 

• for every feature f and for every q' £ Q' , S'{q',f) = S{q',f) (6' is the restriction of 5 to Q') 
IfS{q,iT)1, val{A,7r) is defined to be a single node whose type is T. 

Definition 2.1.10 (Reentrancy) A feature structure A is reentrant iff there exist two different 
paths 7ri,7r2 such that tti) = 6{q,TT2)- In this case the two paths are said to share the same 
value. 

2.2 Subsumption 

Definition 2.2.1 (Subsumption) Ai = ((5i,gi,(5i) subsumes A2 = {Q2,<h,^2) (denoted by 
Ai Q A2) iff there exists a total function ft : Qi — > Q2, called a subsumption morphism, such 
that 

• Hqi) = 92 

• for every q £ Qi, 9{q) C 6{h{q)) 

• for every q € Qi and for every f such that Si{q, f)i, h{5i{q, /)) = 52{h{q), f). 
Ai C A2 iff Ai C A2 and Ai ^ A2. 

h associates with every node in Qi a node in Q2 with at least as specific a type; moreover, if an 
arc labeled / connects q with q', then such an arc connects h{q) with h{q'). If A ^ B then every 
path defined in A is defined in B, and if two paths are reentrant in A they are reentrant in B. 

Lemma 2.2.2 If A 'OB then n{A) C n(S). 

Lemma 2.2.3 If A \Z B then for every ^1,^2 € n(^), ^^(^AjTri) = SA{qA,T^2) implies that 
<5B(9s,7ri) = 5b(5b, 7r2). 

2.3 Well-Foundedness of Subsumption 

Definition 2.3.1 A partial order y on D is well-founded iff there exists no infinite decreasing 
sequence do )^ ^ c?2 ^ • • • of elements of D. 

We prove below that subsumption of TFSs is well-founded iff they are acyclic. 
Lemma 2.3.2 A TPS A is cyclic iff Il{A) is infinite. 

Proof: If A is cyclic, there exist a node q G Q and a non-empty path a that 6{q, a) = q. Let tt 
be such that q = S{q, tt), then the infinite set of paths {vra* | i > 0} is contained in 11(^4). If Il{A) 
is infinite then since Q is finite, there exists a node q & Q that S{q,Tri)l for an infinite number of 
different paths ttj. Since Feats is finite, the out-degree of every node in Q is finite; hence q must 
be part of a cycle. 
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Definition 2.3.3 (Rank) Let r : Types IN be a total function such that r{t) < r{t') if t □ t' . 
For an acyclic TFS A, let A(A) = \n{A) \ - \Qa\ and let Q{A) ^ Y.7,en{A) ^(^('^fe^)))- Define a 
rank for acyclic TFSs: rank{A) = A{A) + @{A). 



By lemma 2.3.2, rank is well defined for acyclic TFSs. A(^) can be thought of as the number of 
reentrancies in A, or the number of different paths that lead to the same node in A. For every 
acyclic TFS A, A{A) > and hence rank{A) > 0. 

Lemma 2.3.4 If A\Z B and both are acyclic then rank{A) < rank{B). 



Proof: Assume A \Z B and both are acyclic; hence by lemma 2.2.2, Ii{A) C 11(5) and by 



lemma 2.3.2 both are finite. Let h : Qa Qb be a subsumption morphism. 



1. If n(A) = n(B) then \Il{A)\ = |n(B)|. A\Z B, hence either there exists a node q eQ a that 
Oiq) C 0{h{q)), and hence OiA) < Q{B) (while A{A) < A(B)); or (by lemma |2.2.3D there 



exist two paths 7ri,7r2 that SAiqAjT^i) 7^ ^^(<7A,7r2), but (^^((i's, tti) — ^B((?s,7r2), in which 
case A{A) < A{B) (while @{A) < 9(5)). In any case, rank{A) < rank{B). 

2. Ifn(A) C n(B) then \I\-{A)\ < |n(B)|; as above, Q{A) < 6(5). However, it might be the case 
that \Qa\ < \Qb\- But for every node q ^ Qb that is not the image of any node in Qa, there 
exists a path tt such that S{qB,T!') = q and tt ^ n(^). Hence |H(A)| — \Qa\ < |n(B)| — \Qb\, 
and rank{A) < rank{B). 

Theorem 2.3.5 Subsumption of TFSs is not well-founded. 

Proof: Consider the infinite sequence of TFSs Aq,Ai,... depicted graphically in figure |2.2[ For 
every i > 0, Ai Zi Ai+i: to see that consider the morphism h that maps qt+i to qi and 6i+i(q, f) 
to Si{h{q), /) (i.e., the first i + 1 nodes of Ai^i are mapped to the first i + 1 nodes of Ai, and the 
additional node of Ai^i is mapped to the last node of Ai). Thus there exists a decreasing infinite 
sequence of cyclic TFSs and subsumption is not well-founded. 

Theorem 2.3.6 Subsumption of acyclic TFSs is well-founded. 



Proof: For every acyclic TFS A, rank{A) is finite and rank{A) > 0. By lemma 2.3.4, if A □ _B 
then rank{A) < rank(B). If an infinite decreasing sequence of acyclic TFSs existed, rank would 
have mapped them to an infinite decreasing subsequence of iV, which is a contradiction. Hence 
subsumption is well-founded. 

2.4 Abstract Feature Structures 

The essential properties of a feature structure, excluding the identities of its nodes, can be captured 
by three components: the set of paths, the type that is assigned to every path, and the sets of paths 



that l ead to the same node. In this section we elaborate on ideas presented in ( Moshier and Rounds 
1987; Moshier, 198S); in contrast to the approach pursued in ( |Carpenter, 1992b ), we first define 



abstract feature structures and then show their relation to concrete ones. The representation of 
graphs as sets of paths is inspired by works on the semantics of concurrent programming languages, 



and the notion of fusion-closure is due to (Emerson, 1983) 



15 



AO 



re 



Al 



re 



A2 




A3 



r 



Figure 2.2: An infinite decreasing sequence of TFSs 



Definition 2.4.1 (Alphabetic variants) Two feature structures A and B are alphabetic vari- 
ants (Ar-. B) iff A QB andBQA. 

Alphabetic variants have exactly the same structure, and corresponding nodes have the same types. 
Only the identities of the nodes distinguish them. 

Definition 2.4.2 (Abstract feature structures) A pre- abstract feature structure (pre- 
AFS) is a triple (11, where 

• n C Paths is a non-empty set of paths 

• : n ^ Types is a total function, assigning a type to every path 

• RiCIIxIIisa relation specifying reentrancy. 

An abstract feature structure (AFS) is a pre-AFS A for which the following requirements hold: 

• n is prefix-closed: ifiraGll then tt G 11 (where n,a £ Paths^ 

• A is fusion-closed: if Tra e 11 and n'a' G 11 and w Ki w' then wa' gU (as well as w'a GH) 



and ira' « ir'a' (as well as n'a « ira) 
• !v is an equivalence relation with a finite index (with 



the set of its equivalence classes) 
• 6 respects the equivalence: if wi » 772 then 6(7ri) = 6(772). 
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An AFS (n, 8, is well-typed if 9(7r) T for every tt e 11 and if tt/ e 11 then Approp{f, Q{n))l 
and Approp{f,0(T:)) C 0(7r/). It is totally well typed if, in addition, for every tt G H, if 
Approp{f, Q{tt))\. then nf £ H. 

Abstract features structures can be related to concrete ones in a natural way: If ^ = (Q, g, S) 
is a TFS then Abs{A) = (H^, Ga, ^a) is defined by: 

. IlA^{7r\Siq,7r)l} 

• TTi 7r2 iff (5(q,7ri) = (5(g,7r2). 



Lemma 2.4.3 //A is a feature structure then Abs{A) is an abstract feature structure. 
Proof: 

1. n is prefix-closed: 11 = {vr | (5(q, tt)].}. If na e 11 then S{q,TTa)i and by the definition of 6, 
S{q, 7r)|, too. 

2. ylfes(A) is fusion-closed: Suppose that ira G n,7r'a' S 11 and tt « tt' . Then S{q,TT) = (5(g, vr'). 
Hence (5(g, 7ra')J, (therefore 7ra' £ 11), and (5(g, 7ra') = (5(7r'a'), therefore na' ~ Tr'a'. In the 
same way, Tr'a £ 11 and Tr'a w Tr'a'. 

3. w is an equivalence relation with a finite index: tti « iff S{q, Tri) — d{q, 112), namely iff tti 
and Tr2 lead to the same node (from q) in A. Hence « is an equivalence relation and since Q 
is finite, w has a finite index. 

4. Q respects the equivalence: 0(Tr) ~ 9{S{q,T:)) and if Tri w 7r2 then 5{q, tti) = 5{q,'K2), hence 
e(7ri) = e(7r2). 

For the reverse direction, consider an AFS A ~ (H, 8,«}. First construct a 'pseudo-TFS', 
Conc{A) — (Q,q,d), that differs from a TFS only in that its nodes are not drawn from the set 
Nodes. Let Q = {q[Tr] \ [tt] £ [~]}, making use of the fact that is of finite index. Let 
(^{qItt]) = 6(7r) for every node - since A is an AFS, 8 respects the equivalence and therefore 9 
is representative-independent. Let q — (/[^j and (5(<Z[Tr],/) = 1[-!Tf] for every node g[^] and feature 
/. Since A is fusion-closed, 6 is representative-independent. By injecting Q into Nodes, making 
use of the richness of Nodes, a concrete TFS Conc{A) is obtained, representing the equivalence 
class of alphabetic variants that can be obtained that w ay. We abuse the notation Conc{A) in the 



sequel to refer to this set of al phabetic variants. Figure |2.3| depicts the abstraction of the example 
feature structure of figure |2.l| . 

Theorem 2.4.4 If A' £ Conc{A) then Abs{A') = A. 

Proof: Let A = {Ua,Qa,^a), A' = {Q,q,6),Abs{A') = (H,8,«). If A' £ Conc{A) then Q can 
be mapped by a one-to-one function to the set of equivalence classes of ^a and S determines the 
paths in H^. By the definition of Abs,Il = Ha- Given a path Tr £ HA,8(Tr) ~ d{S{q,TT)) — 
(^{Q[tv]) = 8^(71"). If TTi ^A T^2 then 5(g, Tri) = S(,<1,t^2) (since A is fusion-closed) and hence tfi Tr2. 

AFSs can be partially ordered: (Ha,8a,~a) d: (^B,'dB,~B) iff C IIb,^a'Z^b and for 
every Tr £ H^, 8A(7r) C 8_B(Tr). This order corresponds to the subsumption ordering on TFSs, as 
the following theorems show. 
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n = {e, SYN, SUBJ, SUBJ AGR, HEAD, HEAD AGR } 
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Figure 2.3: An abstract feature structure 



Theorem 2.4.5 A^B ijf Abs{A) ^ Abs{B). 

Proof: Let Abs{A) = {UA,QA,~A),Abs{B) = {113,03, ~b)- Assume that A ^ B, that is, a 
subsumption morphism h : Qa Qb exists. If vr G Ha then (from the definition of Abs{A)) 
(5^(94, tt) I, that is, there exists a sequence qo, qi, . . . , qn of nodes and a sequence of 
features such that for every i, < i < n, SAiqi, fi+i) — qt+i, qa — qA and tt = fi ■ ■ ■ fn- 
Due to the subsumption morphism, there exists a sequence of nodes /i((jo)j ■ ■ ■ , h{qn) such that 
53{h{qi), fi+i) = h{qi+i) for every i, Q < i < n, and /i(qo) — Qb- Hence tt 6 H^. Moreover, since 
A\— B, for every node q, 9{q) □ 9{h{q)). In particular, 9{qn) E 9{h{qn)) and thus Qa{t^) E ©bCtt). 
Now suppose that two paths vri , 1:2 are reentrant in A. By the definition of subsumption, tti and 
7r2 are reentrant in B, too. Therefore ~a^~_b- 

If Abs{A) < Abs{B), construct a function h : Qa Qb such that h{qA) = qB and for every 
q e QA,h{5A{q, f)) = 53{h{q),f). TriviaUy, h is total and h{qA) = qB- Also, if 5A{q,f)i then 
h{6A{q, f)) = S3{h{q), f). To show that ^(q) C 9{h{q)) for every g, consider a path tt leading 
from qA to g. Since A6s(j4) ^ j46s(_B), 0yi(7r) C 0B(7r) and hence 9{q) C 9{h{q)). Hence h \s a. 
subsumption morphism. 

Theorem 2.4.6 For every A € Conc{A'),B e Conc{B'),A Q B iff A' di B' . 



Proof: Select some A e Conc{A'),B £ Conc{B'). If A C S then, by theorem |2.4.5| , Abs{A) ^ 
Abs{B). By the definition of Cone, Abs{A) = A' and Abs{B) ^ B',so that A' B' . 
If A' ^ i?', construct a function h : Qa — > Qs as follows: First, let h{qA) — qB- Then, perform 
a depth-first search on the graph A and for every node q' = SA{q, f) encountered, if h{q')] set 
h{q') = 53(h{q), f). The order of the search is irrelevant: since A' < B' ,k,a'Q~b' and therefore 
if TTi ka' 1^2 then tti »b/ tt^- Since A' < B' , Ha' ^ ^B' and hence 53{h{q), f) is defined whenever 
SA{q,f) is defined. Hence /i is total and h{qA) = Qb- For every node q e Qa, some path tt exists 
that leads from qA to q and from to h{q). Qa{t^) E ©^(t''), and therefore 9{q) □ 9{h{q)). Hence 
h is a subsumption morphism. 

Corollary 2.4.7 A^ B iff Abs{A) = Abs{B). 

Proof: Immediate from theorem ^.4.5| . 

Corollary 2.4.8 Conc{A) - Conc{B) iff A = B. 

Proof: Immediate from theorem ^.4.6| . 
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2.5 Unification 



As there exists a one to one correspondence between abstract feature structures and (alphabetic 
variants of) concrete ones, wc define unification directly over AFSs. This leads to a simpler 
definition that captures the essence of the operation better than the traditional definition. We use 
the term 'unification' to refer to both the operation and its result. 

Lemma 2.5.1 If A = {UajQaj^a) is a pre-AFS then there exists a pre-AFS B = {HbjQbj^b) 
such that B is the least extension of A to a fusion-closed structure and 65(77) = Qa{t^) for every 

TT G Ua. 

Lemma 2.5.2 If A = {Ha, ©a, is a pre-AFS then there exists a pre-AFS B = (^Ib^Qb^^b) 
such that Ha = Hb, ©a = ©b and p^b is the least extension ofp^A to an equivalence relation. 

Definition 2.5.3 (Closure operations) Let CI be a fusion- closure operation on pre-AFSs: 
Cl{A) = A' , where A' is the least extension of A to a fusion-closed structure. Let Eq{{Il, ©, «)) = 
(n, ©,»')) where w' is the least extension of to an equivalence relation. Let Tj/((n, ©,w)) = 
(n, ©', «) where ©'(tt) = U^,^„ ^M- 

Definition 2.5.4 (Unification) The unification AuB of two AFSs A = {Ua, Qa, ~a) and B = 

{IIb,Qb,^b) is the AFS C = Ty{Eq{Cl{C))), where: 

• C= (nc,©c,~c) 

• He = Ha U Hb 

{©A(7r)U©B(7r) if IT € ILa and n G Ub 
©A(7r) ifTveUA only 

©B(7r) ifwellB only 

• Kic=^A U »B 

The unification fails if there exists a path tt e XIc such that ©c(7r) = T. 

Lemma 2.5.5 CI preserves prefixes: If A is a prefix-closed pre-AFS and A' = Cl{A) then A' is 

prefix-closed. 

Proof: Let tt be a path in 11'. If tt S 11 then every prefix of tt is in If', since If is prefix-closed 
and CI only adds paths. Suppose that tt S If' \ If. Then there exist tti, 7r2, ai, q;2 G Paths such 
that TTiai G If and 7720:2 G H and tti w 112 and tt = 7ria2 (otherwise, tt can be removed from If. 
in contradiction to the minimality of CI). If tt' is a prefix of tt than either tt' is a prefix of tti, in 
which case tt' G 11 since 11 is prefix-closed, or tt' = nia' for some a' that is a prefix of a. Since 11 
is prefix-closed, iria' G 11 and 7720:' G 11. Therefore, as tti w n2,'JTia' is added to 11' by the closure 
operation. 

Lemma 2.5.6 Eq preserves prefixes and fusions: If A is a prefix- and fusion-closed pre-AFS and 
A' = Eq[A) then A' is prefix- and fusion- closed. 
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Proof: Eq extends w to an equivalence relation. Since only w is modified, prefix-closure is trivially 
maintained. Select a pair (tti, 1:2) G ~' \ ~. Then either (1) 7r2 = tti; (2) 7r2 ~ tti; or (3) there exists 
a path 7r3 such that tti « and 713 w 712. Trivially, (1) and (2) preserve the closure properties. In 
the case of (3), to show that fusion-closure is maintained we have to show that if niai G 11' and 
TT2Ct2 G n' then 7riQ;2 G 11' and TTia2 ~' 7r2Q;2. Since 11 = n',7riai G 11 and 7r2Q;2 G H. Since 11 is 
fusion-closed and 112 ~ tts, i^?,a2 G 11 and 7r3a2 ~ 7r2Q;2. Since tti « 7r3, 7ria2 G H and T^ia^ « 77302, 
too. w' is an extension of ~ to an equivalence relation, and thus 7riQ;2 ~' 7r2Q!2. 

Corollary 2.5.7 If A and B are AFSs, then so is A U B. 



Proof: If A and B are AFSs then the pre-AFS C, defined as in 2.5.4 , is prefix-closed (since A 
and B are). Cl{C) is prefix- and fusion-closed, as is Eq{Cl{C)) in which, additionally, « is an 
equivalence relation. Ty{Eq{Cl{C))) is an AFS, since Ty only modifies such that it respects the 
equivalences. 

C" is the smallest AFS that contains lie a^nd ~c- Since 11^ and 11^ are prefix-closed, so is Hp. 
However, He and ~c might not be fusion-closed. This is why CI is applied to them. As a result 



of its application, new paths and equivalence classes might be added. By lemma 2.5.5, if a path is 



added all its prefixes are added, too, so the prefix-closure is preserved. Then, Eq extends ~ to an 



equivalence relation, without harming the prefix- and fusion-closure properties (by lemma 2.5.6). 
Finally, Ty sees to it that Q respects the equivalences. 

Lemma 2.5.8 Unification is commutative: AU B = B \A A. 

Proof: Observe that unification is defined using set union (U) and type unification (U) which are 
commutative. Therefore, the unification is commutative, too. 

Lemma 2.5.9 Unification is associative: [AU B) U C ^ AU [B U C) . 

Proof: as above. 

The result of a unification can differ from any of its arguments in three ways: paths that were 
not present can be added; the types of nodes can become more specific; and reentrancies can be 
added, that is, the number of equivalence classes of paths can decrease. Consequently, the result 
of a unification is always more specific than any of its arguments. 

Theorem 2.5.10 IfC' = AUB then A < C . 

Proof: lie — IIa U 11^ and hence 11^ C lifj. ~A U ~b and hence ~a^~c- If tt G 11^ then 

0c(7r) = 6a (tt) or 9c (tt) = 6a (tt) U 6s(7r), and in any case 6a (tt) C 6c (tt). CI and Eq cannot 
remove paths or equivalences and Ty only makes types more specific, and therefore A < C . 

Theorem 2.5.11 AuB ^ A iff B < A. 

Proof: Suppose B < A. Then C nA,~B^~A and for every tt G 113,63(71) C 6a(7I'). 
AUB^ Ty{Eq{Cl{C))) where C = (He, 6c, «c) and 

. He = Ha U Hb = Ha 

6A(7r) U 6B(7r) if tt G 11^ and 7r G 11^ 
• 6c (^) = { 6a (^) if TT G Ha only = 6a (^) 

6B(7r) if TT G Hb only 
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Hence A^C and therefore AUB = A. 

Suppose A\J B — A and assume toward a contradiction that B A. Then at least one of the 
following cases holds: 

• lis 2 n^. Then there exists tt G Hs U Ha that tt ^ Ha and hence AUB ^ A. 

• There exists some tt such that 8s (tt) % 0a (tt). Then 0yi(7r) U 9s (vr) ^ 9^(71") and hence 
AUB^ A. 

• ~b2~a- Then there exist 7ri,7r2 such that (tti ws 712) but not (tti k.a t^i)- Hence {k-a 
U «s) and A U B A. 



TFSs (and therefore AFSs) can be seen as a generalization of first-order terms (FOTs) (see ( Car- 



penter, 1991 )). Accordingly, AFS unification resembles FOT unification; however, the notion of 
substitution that is central to the definition of FOT unification is missing here, and as far as we 
know, no analog to substitutions in the domain of feature structures was ever presented. 



2.6 A Linear Representation of Feature Structures 

Representing feature structures as either graphs or attribute-value matrices is cumbersome; we 
now define a linear representation for feature structures, based upon Ait-Kaci's V'-terms (though 
the order relation we use is reversed). 

Definition 2.6.1 (Arity) The arity of a type t is the number oj features appropriate for it, i.e. 
\{f\Approp{f,t)i}V 

Note that in every totally well-typed feature structure of type t the number of edges leaving the 
root is exactly the arity of t. Consequently, we use the term 'arity' for (totally well-typed) feature 
structures: the arity of a feature structure of type t is defined to be the arity of t. 

In order to define the set of well- formed linear terms over Feats and Types, we assume that 
the feature names are ordered in a fixed order. 

Let {[3 I i is a natural number} be the set of tags. 

Definition 2.6.2 (Terms) A term r of type t is an expression of the form [T]i(ri, . . . , t„) where 
^ is a tag, n > and every Ti is a term of some type. 

Definition 2.6.3 (Totally v^^ell-typed terms) A term r = [I]i(Ti, . . . ,r„) of type t is totally 
well-typed iff: 

• t is a type of arity n; 

• the appropriate features for the type t are /i, . • . , /«, in this order; 

• for every i, 1 < i < n, Approp{fi, t) — ti; 

• for every i, 1 < i < n, if Ti is a term of type t[ then either ti 'Qt'i or t[ = 1. 
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We distinguish tags that appear in terms according to the type they are attached to: if a sub- 
term consists of a tag and the type _L, we say that the tag is independent. Otherwise, the tag is 
dependent. We will henceforth consider only terms for which the following proposition holds: 

Definition 2.6.4 (Normal terms) A totally well-typed term tp = [I]f(ri, . . . ,r„) is normal iff: 

• t^T; 

• if a tag |j] appears in tp then its first (leftmost) occurrence might be dependent. If it appears 
more than once, its other occurrences are independent. 

• Ti, . . . , T„ are normal terms. 

We use terms to represent feature structures. We define below an algebra over which terms are 
to be interpreted. The denotation of a normal term is a totally well-typed feature structure. 

Definition 2.6.5 (Feature structure algebra) A feature structure algebra is a structure 

A = (D^, {u^ I u e Types}, {{a \ { e Feats}), such that: 

• is a non-empty set, the domain of A; 

• for each t G Types, C Z)^ and, in particular: 

-Ta = 0; 

- ±A = Da; 

— if ti Li t2 =t then fl t2A = ^a 

• for each f e Feats, fA is a total function fA '■ Da — > Da 

Let Dg be the domain of all typed feature structures over Types and Feats. The interpretation 
of to over this domain is the set of feature structures whose roots have the type t; the interpretation 
of fa ■ Dg — > Dg is the function that, given a feature structure A, returns val{A, f). 

We associate a normal term tp with a totally well-typed feature structure A in the following 
way: 

• if V' = Eli then A = ({[I1},[I1, ^, ^t) where S; is undefined for every input and 9t{Q) =t\iq = ^ 
and undefined otherwise; 

• \i ip = [i]i(ri, . . . ,T,i) then A = (Q-El, (5, ^) where 0([T]) = t and for every j, if fj is the j-th 
appropriate feature of the type t, then (5([T], fj) = qj and qj is the root of the feature structure 
associated with Tj. 

Conversely, associate a feature structure A = {Q, q, S, 6) with a normal term ip = [I]t(ri, . . . , t„), 
where: 

• [11 = g; 

• e{q) = t; 

• n is the number of outgoing edges from q; 

• for every j, 1 < j < n, Tj is the term associated with d{q, fj) where fj is the j-th appropriate 
feature of t; 
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• if the tag [T] occurs elsewhere in ri , . . . , r„ , we replace the term that [I] depends on with the 
term _L(), making this occurrence of [T] independent. 

To summarize, there is a one-to-one correspondence between totally well-typed feature struc- 
tures and normal terms. 

Note that the tags are only a means of encoding reentrancy in feature structures. Therefore, 
when displaying a term in which a tag [T] appears just once in a term, we will sometimes omit the 
tag for the sake of compactness. Then, we sometimes omit the type of independent tags, which 
are implicitly typed by _L, and display them as tags only. 



2.7 Multi-rooted Structures 

To be able to represent complex linguistic information, such as phrase structure, the notion of 
feature structures is usually extended. There are two different approaches for representing phrase 
structure in feature structures: by adding special, designated features to the FSs themselves; or 
by defining an extended notion of FSs. The first approach is employed by HPSG: special features, 
such as DTRS (daughters), encode trees in TFSs as lists. This makes it impossible to directly 



access a particular daughter. 3hieber (1992) uses a variant of this approach, where a denumerable 
set of special features, namely 0, 1, ... , are added to encode the order of daughters in a tree. In 
a typed system such as ours, this method would necessitate the addition of special types as well; 
in general, no bound can be placed on the number of features and types necessary to state rules 



(see (Carpenter, 1992b, p. 194)) 



As a more coherent, mathematically elegant solution, we adopt below the other approach: 



a new notion of multi-rooted feature structures, suggested by (3ikkel, 1993), is being defined to 
naturally extend TFSs. These structures provide a means to represent phrasal signs and grammar 
rules. They are used implicitly in the computational linguistics literature, but to the best of our 
knowledge no explicit, formal theory of these structures and their properties was formulated before. 

Definition 2.7.1 (Multi-rooted structures) A multi-rooted feature structure (MRS) is a 
pair (Q, G) where G — {Q, S) is a finite, directed, labeled graph consisting of a set Q C Nodes of 
nodes and a partial function S : Q x FEATS — > Q specifying the arcs, and where Q is an ordered, 
(repetition-free) list of distinguished nodes in Q called roots. G is not necessarily connected, but 
the union of all the nodes reachable from all the roots in Q is required to yield exactly Q. The 
length of a MRS is the number of its roots, \Q\. A denotes the empty MRS, where (3 = 0. 

Meta-variables a,p range over MRSs, and 5,Q and Q over their constituents. If {Q,G) is a 
MRS and qi is a root in Q then qi naturally induces a feature structure Pr{Q,i) = {Qi,qi,6i), 
where Qi is the set of nodes reachable from qi and Si — S\q.. 

One can view a MRS {Q, G) as an ordered sequence {Ai, . . . , An) of (not necessarily disjoint) 
feature structures, where Ai — Pr{Q,i) for 1 < j < n. Note that such an ordered list of feature 
structures is not a sequence in the mathematical sense: removing an element from the list may 
effect the other elements (due to reentrancy among elements). Nevertheless, we can think of a MRS 
as a sequence where a subsequence is obtained by taking a subsequence of the roots and considering 
only the feature structures they induce. We use the two views interchangeably. Figure |2.4| depicts 
a MRS and its view as a sequence of feature structures. The shaded nodes (ordered from left to 
right) constitute Q. 

A MRS is well-typed if all its constituent feature structures are well-typed, and is totally 
well-typed if all its constituents are. Subsumption is extended to MRSs as follows: 
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Figure 2.4: A graph- and AVM- representation of a MRS 



Definition 2.7.2 (Subsumption of multi-rooted structures) A MRS a = {Q,G) subsumes 
a MRS a' = {Q', G') (denoted by a ^ a') if \Q\ = \Q'\ and there exists a total function h : Q ^ Q' 
such that: 

• for every root & Q,h{q{) = q[ 

• for every q&Q, 9{q) C 0'{h{q)) 

• for every q&Q and f G Feats, if 5{q, f)i then h{5{q, /)) = S'{h{q), f) 

We define abstract multi-rooted structures in an analog way to abstract feature structures. 

Definition 2.7.3 (Abstract multi-rooted structures) A pre-abstract multi rooted struc- 
ture (pre-AMRS) is a quadruple A = (Jnd, IT, 6, «), where: 

• Ind, the indices of A, is the set {1, ... ,n} for some n 

• n C Ind X Paths is a set of indexed paths, such that for each i G Ind there exists some 

TT e Paths that {i, n) e n. 

• 6 : n ^ Types is a total type- assignment function 

• fuCHxH is a relation 

An abstract multi-rooted structure (AMRS) is a pre-AMRS A for which the following require- 
ments, naturally extending those of AFSs, hold: 



24 



• H is prefix- closed: if {i, ira) G U then {i, tt) € 11 

• A is fusion-closed: if [1,1:0) S 11 and (i^ir'a') £ 11 and (i,7r) « («',7r') then (i,7ra') gH (as 
well as (i',7r'a) and {i,na') « (i',7r'a') (as well as (i',7r'a) « {i,na)) 

• !v is an equivalence relation with a finite index 

• 9 respects the equivalence: i/(ii,7ri) « (i2,7r2) f/ien 6(ii,7ri) = 0(i2,7r2) 

An AMRS {Ind,U,e,^) is well-typed if for every (i,Tr) e U, Q(i,n) 7^ T and if (i,7r/) e H 
then Approp{f, 7r))J, and Approp{f , Q{i, tt)) □ tt/). It is totally well typed if, in addition, 
for every (z,7r) G 11, if Approp{f ,<d{i,T:)) [ then (i,7r/) G H. The length of an AMRS A is 
len{A) = \IndA\- We use A to denote the empty AMRS, too, where Ind\ = and II^v = (so that 
len{X) = 0). 

The closure operations CI and Eq are naturally extended to AMRSs: If A is a pre- AMRS then 
Cl{A) is the least extension of A that is prefix- and fusion-closed, and Eq{A) is the least extension 
of ^ to a pre-AMRS in which « is an equivalence relation. In addition, Ty{{Ind,Il,Q ,~)) = 
{Ind, n, Q', «) where 6'(i, tt) = U(i',7r')Ki(i,7r) ®(*'' ^')- The partial order ^ is extended to AMRSs: 
{IndA,'n.A,&A,~A) ^ {IndB,'^B,&B,^B) iff /n^A = /nrfs,nA C ns,f«AC«B and for every 
(i, tt) e IIa, Oa(«, tt) E 0b(*, ti")- In the rest of this chapter we overload the symbol 'C' so that it 
denotes subsumption of AMRSs as well as MRSs. 

AMRSs, too, can be related to concrete ones in a natural way: If cr = {Q, G) is a MRS then 
Abs{(j) = {Ind„,Yi„,Qa,K„) is defined by: 

• Ind, = {l,...,\Q\} 

• = {(i,7r) I 5{qi,-K)i) 

• e^(i,7r) = ^(%i,7r)) 

• («,7ri) f«CT (i,7r2) iff 5(gi,'7ri) =5(gj-,7r2) 

It is easy to see that Ahs{(j) is an AMRS. In particular, notice that for every i e Ind^ there exists 
a path TT such that (i,7r) € Her since for every i,5{qi,e)[. The reverse operation, Cone, can be 
defined in a similar manner. 

AMRSs are used to represent ordered collections of AFSs. However, due to the possibility of 
value sharing among the constituents of AMRSs, they are not sequences in the mathematical sense, 
and the notion of sub-structure has to be defined in order to relate them to AFSs. 

Definition 2.7.4 (Sub-structures) Let A = (/nd/i, Ha, 6^, let Ind b be a finite (contigu- 
ous) subset of Ind a; letn+1 be the index of the first element of IndB- The sub-structure of A 
induced by IndB is an AMRS B = (/nds, lis, 9b, ~b) such that: 

• {i — njir) gUb iff i & IndB and {i, tt) G A 

• 9B(i — n, tt) = Ti") if i £ IndB 

• {ii - n,7ri) RiB («2 - n,Tr2) iff ii G IndB, 12 G IndB and (H,7ri) f»A («2,7r2) 
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A sub-structure of A is obtained by selecting a subsequence of the indices of A and considering 
the structure they induce. Trivially, this structure is an AMRS. We use A^--'^ to refer to the sub- 
structure of A induced by {j, . . . , k}. If Inds = {*}, * can be identified with an AFS, denoted 
A\ 

The notion of concatenation has to be defined for AMRSs, too. Notice that by definition, 
concatenated AMRSs cannot share elements between them. 

Definition 2.7.5 (Concatenation) The concatenation of A = (/?id^, 11^, 8/i, w^) and B = 
(/nds, lis, 0s, ~_b) of lengths nA,nB, respectively (denoted by A ■ B), is an AMRS C — 
(/ndc, He, 0(7, «(7) such that 

• Indc = {1, . . . , -I- ns] 

• He = U {{i + UA, tt) I (i, tt) e Hb} 



ec(«,7r) 



eA(i,7r) if i<nA 

QB{i~nA,TT) if i > UA 



• Oic =^ ^A U{((il + HA, TTl), (i2 + UA, TT2)) \ («!, TTl) ~B («2, 7r2)} 

As usual, A - \ ^ X - A = A. 

We now extend the definition of unification to AMRSs: we want to allow the unification of two 
AMRSs, according to a specified set of indices. Therefore, one operand is a pair consisting of an 
AMRS and a set of indices, specifying some elements of it. The second operand is either an AMRS 
or an AFS, considered as an AMRS of length 1. Recall that due to reentrancies, other elements of 
the first AMRS can be affected by this operation. Therefore, the result of the unification is a new 
AMRS. We refer to AMRS unification as "unification in context" in the sequel to emphasize the 
effect that the operation might have on elements that are not directly involved in it. 

Definition 2.7.6 (Unification of AMRSs) Let A = {IndA,UA,QA,~A) be an AMRS. Let 

B = {IndB,TlB, 6b, ~b) be an AMRS (if B is an AFS it is interpreted as an AMRS of length 1). 
Let J be a set of indices such that J C LndA- Let f{i) = i if B is an AMRS, f(i) = 1 if B is an 
AFS. {A, J) \J B is defined if B is an AMRS and J C LndB, or if B is an AFS and \ J\ — 1; in 
any case, it is the AMRS C — Ty{Eq{Cl{{Indc,Tlc,QcT~c))))7 where 

• Indc = IndA 
lie = U {(*, tt) I I e J and {f{i), tt) £ IIb 



ec(«,7r) = 



eA(«,7r) ifi^J 

tt) U QBif{i), tt) if i G J and (i, tt) G 11^ and (/(j), tt) G IIb 

QA{i, tt) if i E J and (i, tt) G Ha arid (/(i), tt) ^ IIb 

QB{f{i),T^) if i £ J and (i, tt) ^ 11^ and (/(i), tt) G IIb 



• «c = ~A U{((ii,7ri), (i2,7r2)) I 11,12 G J and (f{ii),TTi)~B (/(i2),7r2)} 
The unification fails if there exists some pair (i, tt) G Hc' such that Qc'ih '"') ~ T. 

Many of the properties of AFSs, proven in the previous section, hold for AMRSs, too. In 
particular, if A, B are AMRSs then so is {A, J) U _B if it is defined, len{{A, J) Li B) — len{A) and 
{A, J) U B □ A. Also, for every two AMRSs A, B, {A, {1 . . . len{A)}) U B = A iff 5i -^en(A) □ ^_ 



The linear representation of TFSs, suggested in section 2.6, is naturally extended to MRSs: a 



multi-term is a sequence of terms, where the scope of tags is extended to the entire sequence. 
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2.8 Rules and Grammars 



We define rules and grammars over a fixed set Words of words (in addition to the fixed sets 
Feats and Types). We use w to refer to elements of Words, Wi to refer to strings over WORDS. 
We assume that the lexicon associates with every word Wi a set of feature structures Cat{wi), its 
category so we can ignore the terminal words and consider only their categories. The input for 
the parser, therefore, is a sequence^ of sets of TFSs rather than a string of words. 

Definition 2.8.1 (Pre-terminals) Let w = wi . . . w„ G Words*. PTw{j,k) is defined iff I < 
j,k < n, in which case it is the set of AMRSs Abs{{Aj, Aj-^-l, . . . , A^)) where Ai e Cat{wi) for 
j < i < k. If j > k then PT^{j,k) = {A}. We omit the subscript w when it is clear from the 
context. 

Lemma 2.8.2 If w ^ wi ■ ■ ■ Wn, l<i<j<k<n, Ae PT^ii.j) and B G PTw{j + 1, k) then 
A-B e PT^{i,k). 

Proof: An immediate corollary of the definition. 

Definition 2.8.3 (Rules) A rule is a MRS of length n > with a distinguished last element. 
If (Ai, . . . , An-i, An) is a rule then A„ is its headQ and {Ai, . . . , is its body.Q We write 

such a rule as {Ai, . . . , A^-i A^). In addition, every category of a lexical item is a rule (with 
an empty body). We assume that such categories don't head any other rule. 

Notice that the definition supports e-rules, i.e., rules with null bodies. 

Definition 2.8.4 (Grammars) A grammar G = {TZ, As) is a finite set of rules TZ and a start 
symbol Ag that is a TPS. 

Figure |2.5| depicts an example grammar (we use AVM notation for this rule; tags such as [T] 
denote reentrancy). While this example grammar has no linguistic pretensions, it might be viewed 
as generating simple sentences in which the predicates are headed by transitive and intransitive 
verbs. The type hierarchy on which the grammar is based is omitted here. A discussion of the 
methodological status of the start symbol appears later on in this section, prior to the definition 
of languages. 

For the following discussion we fix a particular grammar G ~ (TZ, Ag). Wc define a derivation 
relation over AMRSs as the basis for defining the language of TFS-based grammars. Checking 
whether two given AMRSs A and B stand in the derivation relation is accomplished by the following 
steps: first, an element of A has to be selected; this element has to unify with the head of some 
rule p; then, a sub-structure of B is selected; this substructure has to unify with the body of p. All 
unifications are done in context, so that other components of the AMRSs involved may be affected, 
too. Moreover, there must be some way to record the effects of successive unifications; to this end, 
derivation is defined only for pairs of AMRSs that are already "as specific as needed" ; that is to 
say, if the rule adds any information to the AMRSs, this information already has to be recorded 
in them in order for them to be related by derivation. This is why, in the definition below, we use 

^Cat{wi) is a singleton if Wi is unambiguous. 

^We assume that there is no reentrancy among lexical items. 

•^This use of head must not be confused with the linguistic one, the core features of a phrase. 

^Notice that the traditional direction is reversed and that the head and the body need not be disjoint. 
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Initial symbol: 



Lexicon: 



word 

CAT : 



Rules: 



agr 

rER : 



' : [johnj 



phrase 
CAT : [s 



agr 

rER : 

NUM : 
PRED : 



agr 

PER : 



PRED ; 1^ love j 



sign 
CAT ; 



AGR 
SEM 



sign 

CAT : 



PRED : [3] 



In 



AGR : 

SEM : [2] 



sign 

CAT : 



CAT : Iv 

AGR : [T] 
SEM : [2] 



PRED : [3] 



CAT : 
AGR : 
SEM : 

~ phrase 
CAT : 
AGR : 
SEM : 



m 



sem 

ARGl : [3] 



m 



(2.1) 



(2.2) 



Figure 2.5: An example grammar 



an AMRS R that is at least as specific as some rule p, and not p itself, to guide the derivation. 
This is also why the definition requires that all the unifications do not add information, strong 

derivation is the relation that holds between such AMRSs; another relation, derivation, relaxes that 
requirement by allowing two AMRSs to be related even if they contain only part of the information 
that is required for strong derivation to hold. 

Since elements of AMRSs involve indices that denote their linear position in the sequence of 
roots, the operation of replacing some element in one AMRS with a s\ib-structure, whose length 
might be greater than one, becomes notationally complicated. Conceptually, though, it resembles 
very much the replacement of some symbol with a seciucncc of symbols in context-free derivation, 
or the replacement of the selected goal (that unifies with the head of some rule) with the body 
of the rule, in Prolog SLD-resolution. One main difference in our definition is that we do not 
carry substitutions through sequences of derivations; rather, we treat all the pairs in a derivation 
sequence as if the appropriate substitutions have already been applied to them (recall that members 
of these pairs are "as specific as needed"). 

Definition 2.8.5 (Strong Derivation) An AMRS A = (JndA) IIa) ©A) whose length is k 
strongly derives an AMRS B ( denoted A ^ B) iff 

• there exist a rule p €TZ and an AMRS R □ Abs{p) (with len{R) = n), such that: 

• some element of A unifies with the head of R, and some sub-structure of B unifies with the 
body of R; namely, there exist j G IndA and i G Inds such that: 
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A = (A, {j}) U i?", B'-»+"-2 ^{B,{i...i + n-2})U 
i? = (i?, {n}) U AJ, i? (i?, {1 . . . n - 1}) U B^ -*+»-2 

• B is the replacement of the j-th element of A with the body of R; namely, let 
J.,.-, { i ifl<i<j . , . , ^ . 

then B — Ty{Eq{Cl{{IndB' , TIb>, Qb', ~b')))), where 

- IndB' = (1, . . . , fc + n - 2} 

\ Qr{i ,7r) if 1= g{i ) 

- (ii,7ri) «s/ («2,7r2) i/ 

* ii = /(i'l) and 12 = /(is) oTirf (ii,7ri) «a (i2,7i"2), or 

* ii = fflKii 12 = .g(i2) and (j'i,7ri) (12: ""2); or 

* ii — f{i'i) and 12 — 5(«2) '^'^'^ i/iere exisi 7ri,7r2,7r3 such that (z^jTTi) (jjTra) and 
{n,iTz) ~R {■12, -^2), or 

>(! ii — g(^i'^) andi2~f{i'2) and there exist TTi,Tr2, 7^3 such that {i[,TTi) (jjir^) and 
{n,TT3) {i'2,'^2) 

The reflexive transitive closure of » ', denoted '-^ is defined as follows: A A A" if A = A" or 
if there exists A' such that A ^ A' and A' ^ A" . 

Intuitively, A strongly derives B through some AFS A^ in A, if some rule p £ TZ licenses the 
derivation. A^ is unified with the head of the rule, and if the unification succeeds, the (possibly 



modified) body of the rule replaces A^ in B. The definition is graphically demonstrated in figure 2.6 

Lemma 2.8.6 If A ^ B and A ^ A' then there exists B' such that B 'O B' and A' B' . 

Proof: A B, therefore there exists a rule p £ 7^, an AMRS R □ Abs{p) and an index j such 
that A unifies with the head of R, and B is obtained by replacing A^ with the body of R. A and B 
are already "as specific as needed"; thus, since AQ A' and A = {A, {j}) Ui?", A' = {A', {j}) Ui?". 
Hence there exists R' ^ R such that R' = {R',{n}) U A'^ , A' unifies with its head and B' is 
obtained by replacing the j-th element of A' with its body. 

Lemma 2.8.7 If A ^ B and A ^ A' then there exists B' such that B 'O B' and A' A B' . 



Proof: By induction on the derivation sequence and lemma 2.8.6 



Lemma 2.8.8 If A^-'' A B and A'^+i A C then A^-''+^ ^ B-C. 

Proof: The derivation is obtained by applying first the derivation steps that derive B from A^ "^ 
and then those that derive C from A^'^'^ . Since A^---^ ^ B, A\s "as specific as needed" and the 
application of the derivation steps from A to _B does not affect the applicability of the derivations 
step to C. 
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Figure 2.6: Strong derivation 



Lemma 2.8.9 If Abs{p) for some p e TZ of length n then A" - 

Proof: Immediate from the definition of derivation. 

There are various definitions in the hterature for the language that is defined by a grammar 



G expressed in a unification-based grammar formalism. For example, (Shieber, 1985; Shieber 



1992) do not include a start symbol in the grammar at all, and define L{G) as the set of strings 
derivable from some feature structure. In ( ^hieber, Schabes, and Pereira, 1994 ) a start symbol 



is defined (notated goal axiom), and L{G) is defined as the set of strings that are derivable from 
some generalization of the start symbol, i.e., from some feature structure that subsumes it. ( ^ikkel. 



1993|) , on the other hand, assumes that a specific feature cat is present in every feature structure 
(the value of which simulates non-terminals in a context-free "underlying" grammar), and uses 
this feature to single out the start symbol: L{G) is the set of strings that are derivable from some 
feature structure in which the cat feature is S (the st art symbol of the underlying context-free 



grammar). A similar definition is given by ( Haas, 1989 ): L{G) is the set of strings derivable from 



the start symbol, where the start symbol is a constant (that is, an atomic feature structure). 

There is a good motivation to employ a start symbol: the grammar writer might want to 
specify a certain criterion for the permissible strings in the language, for example, that they are all 
sentences. Moreover, it makes sense to include in the language such strings that are not derived 
directly by the start symbol, but rather by a TFS that is related to the start symbol. For example, 
the grammar writer might state that only TFSs with a cat feature valued S are permissible, meaning 
that every TFS that is subsumed by the start symbol (that is, contains all the information it 
encodes) is a sentence. However, such a definition prevents the incorporation of subsumption test 
(see section ^.10.3| below) into the parsing, since the correctness of the computation can not be 
maintained. 
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Due to these consideration we chose a relaxed condition on the start symbol in our definition of 
languages. We define a derivation relation between AMRSs in a way that allows the initial symbol of 
the grammar to derive a sequence of lexical entries even if the actual (strong) derivation is between 
a TFS that unifies with the start symbol and a more specific instance of the pre-terminals. 

Definition 2.8.10 (Derivation) An AMRS A derives an AMRS B (A B) iff there exist 
AMRSs A', B' such that {A, {I,..., len{A)}) U A' ^ T, B C B' and A' A B' . 



Definition 2.8.11 (Language) The language of a grammar G is L{G) — {w = wi ■ ■ ■ Wn S 

Words* | Abs{As) ^ B for some B e PT^{1, n)}. 

Figure pTz] depicts a derivation of the string "John loves her" with respect to the example grammar. 
The scope of reentrancy tags is limited to one MRS, of course, but we use the same tags across 
different MRSs to emphasize the flow of information during derivation. This example shows that 
the sentence "John loves her" is in the language of the example grammar, since the derivation 
starts with a TFS that is more specific than the initial symbol and ends in a specification of the 
lexical entries of the sentences' words. 



sign 

CAT : 



AGR : [T] 



PER : 
NUM : 



PRED : [3] l^john j 



AGR : \T\ 



PER. : 
NUM : 



PRED : [3] j^john 
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Figure 2.7: A leftmost derivation 
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2.9 Parsing as Operational Semantics 



Parsing is the process of determining whether a given string belongs to the language defined by a 
given grammar, and assigning a structure to the permissible strings. Various parsing algorithms 
exist for various classes of grammars. In this section we formalize and explicate some of the 



notions of (Carpenter, 1992b, chapter 13). We give direct definitions for rules, grammars and 
languages, based on our n ew notion of AMRSs. This presentation is more adequate to current 
TFS-based systems than ( Haas, 1989| ; Shieber, Schabes, and Pereira, 1994 ), that use first-order 



terms. Moreover, it does not necessitate special, ad-hoc features and types for encoding trees 



in TFSs as ( Shieber, 1992) does. We don't ass ume any exp licit context-free backbone for the 
grammars, as do dKaplan and Bresnan, 1982| ) or ( |Sikkel, 199^ ). 



The parsing algorithm we describe is a pure bottom-up one that makes use of a chart to record 
edges. The formalism we presented is aimed at being a platform for specifying grammars in HPSG, 
which is characterized by employing a few very general rules (or rule schemata) ; selecting the rules 
that are applicable in every step of the process requires unification anyhow. Therefore we choose 
a particular parsing algorithm that does not make use of top down predictions but rather assumes 
that every rule might be applied in every step. This assumption is realized by initializing the chart 
with predictive edges for every rule, in every position. 



As is well known (see, e.g., (Lloyd, 1987)), the meaning of a logic program P can be specified 
algebraically as the least fix-point (Ifp) of the immediate consequence operator Tp of the program. 
A similar approach can be applied to a context-free grammar G, such that L{G) equals (a projection 
of) the least fix-point of an analogous immediate derivation operator, Tq- Let G = {V,T, P, S) 
be a context-free grammar.^ Let I C V x T* . Define Tc{I) = {(A,w) \ A ^ w e P,w £ 
T}U{{A,wi---Wk) \ Ai---Ak e P, {Ai,w,) £ 1,1 < i < k}. Then the least fix-point of Tq 
is the union over A £ V of {{A, w) \ w £ La{G)}. 

In a sense, computing the Ifp of Tq corresponds to computing the language generated by G. 
Parsing, then, amounts to checking if the input w is in the language. This process induces an 
inherently inefficient computation: since w is given, it can be used to optimize the computation. 
This is achieved by defining Tq^w, a parsing step operator, which is dependent on the input sentence 
w. The set of items / has to be extended, too: an item is a triple [i, A, j] where < i,j,< n (n being 
the length of w) and A £ V. Informally, an item [i, A, j] represents the existence of a derivation for 
the symbol A to a substring of w, namely wi . . .Wj. w £ Ls(G) if and only if [1, 5, n] £ lfp{Tc.w), 
so that parsing now amounts to computing the least fixed point of Tq^^, which is more efficient, 
and then checking whether the appropriate item is in the Ifp. 

We now return to TFS-based formalisms and define Tq^^) for a TFS-based grammar G, thus 
providing means for defining the meaning of G. A computation is triggered by some input string 
of words w = wi ■ ■ ■ Wn of length n > 0. For the following discussion we fix a particular grammar 
G = {TZ, As) and a particular input string w of length n. A state of the computation is a set of 
items, and states are related by a transition relation. The presentation below corresponds to a 
pure bottom-up parsing algorithm, as it is both simple and efficient. 

Definition 2.9.1 (Items) An item is a four-tuple [i,A,j,c], where i,j £ IN, i < j, A is an 
AMRS and c is either AcT, in which case the item is active, or Comp, in which case it is 
complete. Let Items be the collection of all items. 

If [i, A,j, c] is an item, we say that A spans the input from position i -1- 1 to position j (inclusive). 
A can be seen as a representation of a dotted rule, or edge: during parsing all generated items 

^We assume a normal form, where for A —* a £ P, either a G T or « £ V . 
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are such that A is (possibly more specific than) a prefix of some grammar rule. The notion of 
items usually employs edges that contain entire rules, whereas we only use prefixes of rules. This 
diff'erence is not essential, and in an actual implementation of a parser that is induced by Tq^w, 
edges indeed include a reference to the rule on which they rely. 

In what follows we define Tq^^, a parsing operator that corresponds to (bottom- up) chart 
parsing. However, it is possible to characterize algebraic operators that correspond to other parsing 
schemas as well. 



Definition 2.9.2 Let Tg.w : 
/ e Items, a; e Tg,w{I) iff either 



2ITEMS ^ transformation on sets of items, where for 



=> A,r, , m > 1 



or 



or 



or 



or 



3p e n,Abs{p) = R = Ai, 
3/e < m — 1 

3a € I, a = [ia, Aa, ja, Act], len{Aa) — k 
3/3 e [i0,Ai3,ji3,COMP],len{Ap) = 1 

ja = iff 

B= (i?,{l...fc})UA„ 

C= (B,{k + l})uAp 

3p e TZ,Abs{p) = R = Ai,.. .,Am-i => A,n,m > 1 
3a E I,a = [ia, A^, ja, Act], len{Aa) = to — 1 
C= (i?,{l...m-l})UA„ 

X = [ia,C™,ja,COMP] 

3i,0<i<n 
X = [i,X,i, Act] 

3p e TZ, len{p) = 1 

3i, < i < n 

X = [i, Abs[p), i, COMP] 

w = Wi, . . . ,Wn,n > 1 
3i,0 < i < n 

X — [i ~ 1, Abs{Ai),i, COMP], e Cat(wi) 



(2.1) 



(2.2) 

(2.3) 
(2.4) 

(2.5) 



Cases 2.1 and [2.2| perform the operation known as completion: 2.1 moves the dot one position 



along the body of a rule, and 2.2 creates a complete item once the dot reaches the end of the 
body. Case 2.3 corresponds to the prediction operation, whereas case 2.5 corresponds to scanning. 
Case 2.4 handles e-rules, i.e., rules with null bodies, and creates complete items that span a null 
substring of the input sentence. Notice that cases 2.3 and 2.4 are independent of the argument / 
and therefore add the same items in every application of Tg.w Case 2.5 is also independent of the 
argument, but is dependent on the input sentence w. 

The operator Tg,w, on which the algebraic semantics of TFS-based grammars is based, naturally 
induces an operational semantics for such formalisms: once the operator is shown to be continuous, 
a computational process that corresponds to the iterative application of Tg.w computes the set 
of items in the least fix-point of the operator. This process can be thought of as an analog of 
bottom-up, chart-based parsing: the chart is initialized with predictions for every rule in every 
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position (by operation 2.3) and with complete edges for every input word (by operation |2.5| ). Then, 
operations |2. 1| and 2.2 are used to apply the grammar rules using the chart, in an unspecified order. 
We prove below that the process is indeed analogous to parsing w with respect to G. 

Theorem 2.9.3 Tq.w *s monotone: if Ii C I2 then Tg,w{Ii) ^ Tg,w{Ii)- 

Proof: Suppose h h- If x G Tg,w{Ii) then x was added by one of the five operations; 
operations 2^, 2A and |2.5| add the same items every time Tg,w is applied, and thus x G Tg,w{I2), 



too. If X was added by operation 2.1, then there exist items a, (3 in Ji to which this operation 



applies. S ince Ii C /2, a,P are in I2, too, and hence x G 7g,u,(/2), too. The same applies for 



operation p.2| . In any case, x £ T^wih) and hence Tg,w{Ii) ^ Ta^wih)- 



Theorem 2.9.4 Tq_w is continuous: if > is a chain, then 7G,to(Ui-^0 ~ [Ji'^G,w{Ii)- 

Proof: First, Tq^w is monotone. Second, let I — Iq C Ii C ... be a chain of items. If a; G 
TG^wiUiXjli) then there exist a,(3 £ Ui>o required, due to which x is added. Then there 
exist i,j such that a £ Ii and [3 £ Ij. Let k be the maximum of i, j. Then a,(3 £ 1^, x G Tq ,^{I]^) 
and hence x G lJi>o Tcwih)- 

If X G [Ji>oTG,wiIi) then there exists some i that x G TG,ii,{Ii). Ii C lJj>Q and since Tg,w is 
monotone, Tcwili) ^ ^G.ii;(Ui>o -^Oi and hence x G 7G.ii>(Ui>o -^«)- Therefore Tq^w is continuous. 

Corollary 2.9.5 The least fix-point of Tq^w can be obtained by iteratively computing Im+i — 
Tcwilrn), starting from Iq = 9 and stopping when a fix-point is reached. 

Proof: By Tarski-Knaster theorem, the Ifp exists since Tg,w is monotone; By Kleene's theorem, 
since Tq yj is continuous, the Ifp can be obtained by applying the operator iteratively, starting from 
0. 

Definition 2.9.6 (Algebraic meaning) The meaning of a grammar G with respect to an input 
sentence w is the least fix-point of the operator 



Definition 2.9.7 (Computation) T/ie w-computation triggered byw £ Words* is the infinite 
sequence of sets of items Ii,i > 0, such that /q = and for every m > 0, Im+i = Tq^^II^^). 
The computation is terminating if there exists some m > for which /,„ = Im+i (i-e., a fix- 
point is reached in finite time). The computation is successful if there exists some m such that 
[0, A, n, COMP] G Im, where len{A) — 1 and A U Abs{As) T; otherwise, the computation fails. 

Notice that we check whether the generated items are unifiable with the initial symbol, in ac- 
cordance with the definition of languages. If the initial symbol of the grammar is interpreted 
differently when languages are defined, a corresponding modification has to be made in the condi- 
tion for success. 



2.10 Proof of Correctness 

In this section we show that parsing, as defined above, is (partially) correct. First, the algorithm is 
sound: a lu-computation succeeds only if w G L{G); second, it is complete: if w G L{G), it triggers 
a successful w-computation. Computations arc not guaranteed to terminate, but we show that 
termination is assured for a certain subset of the grammars that are off-line parsable. We discuss 



off-line parsability in section 2.10.4 
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2.10.1 Soundness 

In what follows we fix a particular w-computation /q, /i, . . ., triggered by some input w — wi 
Lemma 2.10.1 // [i, A,j, Comp] G // for some I then len{A) = 1. 



Proof: By definition of Tq^^, complete items are generated by operations 2^, 2^ and All 
these operations add items in which the AMRS is of length 1. 

Lemma 2.10.2 // Act] G /; for some I and len{A) = fc > then there exists p € TZ such 

that Absip^--'' C A. 

Proof: By induction on L If ^ = then Ii — ^ and the proposition holds vacuously. Assume that 
the proposition holds for every V < I. Suppose that x = [i,A,j,AcT] S /; and A ^ X. Then x 
must have been added by operation |2.l| (dot movement). Then x = [i^, j/j, Act] where 

C = ((i?, {1 . . . fc}) U Aa), {k + 1}) U Ap, namely C □ i? and thus C'l-'^'+i □ R^-''+^. 

Theorem 2.10.3 (Parsing invariant (a)) // [«,j4,j, c] G /; and i < j then there exist B €E 
PT^,{i + 1, j) and A' ^ B such that A ^ A' . If i ^ j then A A A. 

Proof: By induction on I. 

If ^ = then Ii and the proposition holds vacuously. 

Assume that the proposition holds for every V < I. Suppose that x — [i, A,j, c] G //. Then x must 
have been added by one of the operations. Consider each case separately: 



2.1 



dot movement: x ^ [i^, j/3, Act] where there exist a,/3 G as required and 

C = {B, {k + 1}) U Ap, B — {R, {1 . . . k}) U Aa- By the induction hypothesis, there exist 
A'^, Ba such that Aa ^ A'^ and A'^ □ Ba G PT{ia + l, ja)- Also, there exist A'^, Bp such that 
Ap A A'f^ and A'^^ ^ Bp e PT{ja + IJp). B^-^ = (i?, {1 . . . fc}) U if fc > 0, yl„ 7^ A and 



by lemma ^TOll Ag □ R, hence B'^-^ = A^. If fc = 0, B^-^ = X^ A^. Hence B^-^ A A'^ 



(ji..k □ j^i- fc^ and by lemma |2.8.7| there exists A'^ □ A'^ such that A'^. In the same 



way, there exists □ A'^ such that C^+^ A A'^^. By lemma ^.8.8| , C^-^^+^ A A'^ ■ A'[^. But 
K ■ ^p g ■A!^^Ba- Bp, and since B„ G PT{ia + 1, ja) and Bp G PT{ja + l,jp), by 



lemma 2.8.2 Ba ■ Bp G PT{ia + j/s)- The cases in which ia = or ip — jp are trivial. 



2.2 



completion: x — [ia,C™,ja, Comp] where C — {R, {1 ... m — 1}) U Aa and there exist an 
abstract rule R and an item a G as required, and (by lemma p.l0.2| ) A]^--^~^ □ R^- -^~^ . 
If ia < ia then by the induction hypothesis, there exist A'a,Ba such that Aa — > and 
A'a ^ Ba e PT{ia + IJa)- C = (i?, {1 . . . m - 1}) U Aa, hcucc C^-™-! = Aa and thus 
^i...m-i J!; Pj.qj^ lemma I^U, C™ c^-^-i^ and thus C™ A A'a- If = then 



Aa ^ X and hence C™ X. 
|2]3|. prediction: x = [z, A, i. Act] and PT{i + 1, i) = 0. 
|2T^ . e-rules: x = [i, A6s(p), Comp] and PT{i + 1, i) = 0. 

|2.5| . scanning: x ~ [i ~ 1, Abs{Ai),i,COMP] where Ai G Cat{wi), and ^6s(Ai) A ^6s(Ai) triv- 
ially. Ahs{Ai) G PT(i + l,j) by definition. 
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Theorem 2.10.4 If a computation, triggered by w, is successful, then w G L{G). 

Proof: If a computation is successful then there exists some m > such that x = [0, A, n, Comp] G 
/,„ where len{A) = 1 and A U Ahs{As) ^ T. By the parsing invariant, A ^ A' for some A' □ S G 
Pr^(l,n). Hence Abs(As) ^ B and u; G L[G). 



2.10.2 Completeness 



The foUowing theorem shows that one derivation step, hcensed by a rule Abs{p) of length r, 
corresponds to r+1 applications of T^.^, starting with an item that predicts the rule and advancing 
the dot r times, until a complete item for that rule is generated. 

Theorem 2.10.5 (Parsing invariant (b)) If A ^ A' and A' ^ B e PTw{i + then for 
every k, < k < len{A), there exists Ik such that [ik,Ck,jk,COMP] G Ii^, where Ck E A'^ , ii = i, 
Hen{A) ^ ""'^ -^fe = ^■/'^ < len{A). 

Proof: By induction on /, the number of derivation steps from A to A': 

Ul = 0, A = A' ^ B. Since B G PT^{i + 1, j), B = Abs{A,+i) Abs{Aj) where Ak G Cat{wk) 
for i + 1 < k < j. The scanning operation ( |2.5| ) of Tq^^ adds appropriate items whenever it is 
applied. 

Assume that A ^ D A _B □ PT„,(i + l, j) and the proposition holds for D and B. By the induction 
hypothesis, for every k, < k < len{D), there exists Ik such that [ik,Ck,jk,GOMP] G Ii^, where 
Ck E A''. Suppose that A D through a rule p of length r by expanding A^ to _ 
Then the following sequence of items is generated, where for every m, Ci...„i C ]jx...x+m-i ^ g^^-^^ 

Cr E Ay-. 

h 

Ih 
Ih 



[i, A,i, Act] 
[ii,Ci,ji,COMP] 
[i,Ci,ji,AcT] 

[j2,C2,j2,COMP] 
[j,Ci...2,j2,ACT] 



G 

G 
G 
G 
G 



by prediction (2.3) 

by the induction hypothesis 



by dot movement (2.1) 



max{h J2 ) 



by the induction hypothesis 
by dot movement (2.1) 



Ci,,,r~l, jr-l 

[i,Cr,j, Comp] 



Act] 



G 

G 



I, 



max(li 



by dot movement (2.1) 
by completion (2.2) 



Items are generated by the dot movement (2.1) operation since the conditions for its application 
obtain: it is easy to see that the indices {i,j) match; in addition, if for some m, Ci...m E jjx...x+m-i ^ 
and for every k, Ck E A'', then there exists Ci. ..,„+! that is obtained by unifying some R □ Abs{p) 
first with Ci...„i and then with C™+^, such that Ci. ..,„+! E I?^- as required. Therefore, by 
induction on m it can be shown that all the items that result from dot movement are indeed 
generated. Finally, the completion (2.2) operation is applicable and (since A ^ D) we have 

Cr E Ay. 

Theorem 2.10.6 If w ^ LiC) then the computation triggered by w is successful. 

Proof: w G L{G), hence Abs{As) B, where B G PTu,{l,n). Hence there exist A' , B' such 
that A' U Abs{As) =/= T,B' ^ B and A' A B' . By the parsing invariant, there exists I such that 



[0, C, 71, Comp] 
successful. 



G /; where C ^ A' . Hence C U Abs{As) 7^ T, and therefore the computation is 
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2.10.3 Subsumption Check 



To assure efficient computation and eliminate redundant items, many parsing algorithms employ 
a mechanism called subsumption check (see, e.g., ( ^hieber, 1992 ; Sikkel, 1993| )) to filter out certain 
generated items. We introduce this mechanism below and show that it doesn't effect the correctness 
of the computation. 

Define a (partial) order over items: [ii, ^i, ji, Ci] ^ [*2j ^21^2, C2] iff ii = 121 ji = j2iCi = C2 
and Ai !^ ^2- Modify the ordering on sets of items as follows: Ii ^ I2 iff for every Xi G Ii there 
exists X2 G I2 such that X2 ^ a^i- Sets of items are no longer ordered by inclusion, but rather by 
a weaker condition that only requires the existence of a more general item (in the higher set) for 
every item (in the lower set). 

The subsumption filter is realized by modifying Tg,w'- x G Tg,w{I) only if there does not exist 
any item x' G Tg,w{I) such that x' < x. Namely, for all items that span the same substring and 
have the same status (Act or Comp), only the most general one is preserved across successive 
applications of Tg^w Given the new ordering of sets of items, it can be shown that this modification 
does not harm neither monotonicity nor continuity, and hence every computation is guaranteed 
to reach a least fix-point. Obviously, the soundness of the computation is also maintained. More 
interestingly, completeness is preserved, too: recall that the parsing invariant (b) states that if 
A ^ A' B then for every k some item [ik, Ck,jk, Comp] is generated such that Ck E A^. Since 
the subsumption test only leaves out an item if a more general one exists, the invariant still holds 
and hence the correctness of the computation is guaranteed. Notice that if L{G) would have been 
defined as the set of strings that are derivable from the start symbol itself, the subsumption check 
might have removed crucial items, and the computation could cease to be correct. 



2.10.4 Termination 



It is well-known (see, e.g., (Pereira and Warren, 1983; Johnson, 198§| )) that unification-based 
grammar formalisms are Turing-equivalent, and therefore decidability cannot be guaranteed in 
the general case. However, for grammars that satisfy a certain restriction, termination of the 
computation can be proven. We make use of the well-foundedness result (section 2.3) to prove 
that parsing is terminating for off-line parsable grammars. 



Off-line parsability was introduced by (Kaplan and Bresnan, 1982) and adopted by (Pereira 
and Warren, 1983), according to which "A grammar is off-line parsable if its context-free skeleton 
is not infinitely ambiguous". As (Johnson, 198S) points out, this restriction (defined in slightly 
different terms) "ensures that the number of constituent structures that have a given string as 
their yield is bounded by a computable function of the length of that string" . The problem with 
this definition is demonstrated by ( Haas, 1989| ): "Not every natural unification grammar has a 



context-free backbone" 

A context-free backbone is inherent in LFG, due to the separation of c-structure from f-structure 
and the explicit demand that the c-structure be context-free. However, this notion is not well- 
defined in HPSG, where phrase structure is encoded within feature structures (indeed, HPSG itself 
is not well-defined in the formal language sense). Such a b ackbone is cert ainly missing in Categorial 
Grammar, as there might be infinitely many categories. ( Shicber, 1992 ) generalizes the concept of 
off-line parsability but doesn't prove that parsing with off-line parsable grammars is terminating. 
We use an adaptation of his definition below and provide a proof. 

To overcome this problem, ( Haas, 1989 ) uses a different restriction: "A grammar is depth- 
bounded if for every L > there is a D > such that every parse tree for a sentential form of L 
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symbols has depth less than _D". (Shieber, 1992) generalizes it and we use an adaptation of his 
definition below. 



Definition 2.10.7 (Finite-range decreasing functions) A function F : D ^ D, where D is a 
partially- ordered set, is finite-range decreasing (FRD) iff the range of F is finite and for every 
d e D,F{d) ^ d. 



Definition 2.10.8 (Strong off-line parsability) A grammar is strongly off-line parsable iff 

there exists an FRD-function F from AMRSs to AMRSs (partially ordered by subsumption) such 
that for every string w and different AMRSs A, B such that A ^ B, if A ^ PTw{i + and 
B PTu,{i + then F{A) ^ F{B). 

Strong off-line parsability guarantees that any particular sub-string of the input can only be 
spanned by a finite number of AMRSs: if a grammar is strongly off-line parsable, there can not 
exist an infinite set S of AMRSs, such that for some < i < j < s — > PT.w{i + 1, j) for every 
s ^ S. If such a set existed, F would have mapped its elements to the set {-F'(s) | s G S}. This set 
is infinite since S is infinite and F doesn't map two different items to the same image, and thus 
the finite range assumption on F is contradicted. 



As (Shieber, 1992) points out, "there are non-off-line-parsable grammars for which termination 
holds" . We use below a more general notion of this restriction: we require that F produce a 
different output on A and B only if they are incomparable with respect to subsumption. We 
thereby extend the class of grammars for which parsing is guaranteed to terminate (although there 
still remain decidable grammars for which even the weaker restriction doesn't hold). 

Definition 2.10.9 (Weak off-line parsability) A grammar G is weakly off-line parsable iff 

there exists an FRD-function F from AMRSs to AMRSs (partially ordered by subsumption) such 
that for every string w and different AMRSs A, B such that A ^ B, if A PTw{i + l,i), 
B PT^{i + 1, j), AgB and BgA, then F{A) ^ F{B). 

Clearly, strong off-line parsability implies weak off-line parsability. However, as we show below, 
the inverse implication does not hold. 

We now prove that weakly off-line parsable grammars guarantee termination of parsing in the 
presence of acyclic AMRSs. We prove that if these conditions hold, only a finite number of different 
items can be generated during a computation. The main idea is the following: if an infinite number 
of different items were generated, then an infinite number of different items must span the same 
sub-string of the input (since the input is fixed and finite). By the parsing invariant, this would 
mean that an infinite number of AMRSs derive the same sub-string of the input. This, in turn, 
contradicts the weak off-line parsability constraint. 

Theorem 2.10.10 IfG is weakly off-line parsable and AMRSs are acyclic then every computation 
terminates. 

Proof: Fix a computation triggered by w of length n. We claim that there is only a finite number 
of generated items. Observe that the indices that determine the span of items are bounded: 
< i < J < J^- It remains to show that only a finite number of AMRSs are generated. Let 
X = [i,A,j,c] be a generated item. Suppose another item is generated where only the AMRS is 
different: x' = [i,B, j, c] and A B. If A\Z B, x' will not be preserved because of the subsumption 
test. There is only a finite number of AMRSs A' such that A' I— A (since subsumption is well- 
founded for acyclic AMRSs). Now suppose A% B and B A. By the parsing invariant (a) there 
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exist A',B' such that A ^ A' £ PTy,{i + 1, j) and B ^ B' e PT^,{i + Since G is off-hne 

parsable, F{A) ^ F{B). Since the range of F is finite, there are only finitely many items with 
equal span that are pairwise incomparable. Since only a finite number of items can be generated 
and the computation uses a finite number of operations, the least fix-point is reached within a 
finite number of steps. 

The above proof relies on the well-foundedness of subsumption, and indeed termination of pars- 
ing is not guaranteed by weak off-line parsability for grammars based on cyclic TFSs. Obviously, 
cycles can occur during unification even if the unificands are acyclic. However, it is possible (albeit 
costly, from a practical point of view) to spot them during parsing. Indeed, many imple mentations 
of logi c programming languages, as well as of unification-based grammars (e.g., ALE ( [Carpent'er ' 



1992a| )) do not check for cycles. If cyclic TFSs are allowed, the more strict notion of strong off-line 
parsability is needed. Under the strong condition the above proof is applicable for the case of 
non-well-founded subsumption as well. 

To exemplify the difference between strong and weak off-line parsability, consider a grammar 
G that contains the following single rule: 



t 




t 









and the single lexical entry, wi, whose category is: 

Cat{wi) = 



This lexical entry can be derived by an infinite number of TFSs: 
t 



/ 



t 



t 



Cat{wi) 



It is easy to see that no FRD-function can distinguish (in pairs) among these TFSs, and hence 
the grammar is not strongly off-line parsable. The grammar is, however, weakly off-line parsable: 
since the TFSs that derive each lexical entry form a subsumption chain, the antecedent of the 
implication in the definition for weak off-line parsability never holds; even trivial functions such 
as the function that returns the empty TFS for every input are appropriate FRD-functions. Thus 
parsing is guaranteed to terminate with this grammar. 

It might be claimed that the example rule is not a part of any grammar for a natural language. 
It is unclear whether the distinction between weak and strong off-line parsability is relevant when 
"natural" grammars are concerned. Still, it is important when the formal, mathematical and 
computational properties of grammars are concerned. We believe that a better understanding of 
formal properties leads to a better understanding of "natural" grammars as well. Furthermore, 
what might be seem un-natural today can be common practice in the future. 
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Chapter 3 



AMACIA - An Abstract Machine 
for Linguistic Applications 



This chapter details the design and implementation of the abstract machine. Section 3.1 presents 
the formal language in which specifications are input, including the type hierarchy, the grammar 
and the lexicon. The machine is explained incrementally: section [3^ describes its core engine, ded- 
icated to the unification of two feature structures. The engine is enveloped with control structures 
to accommodate for parsing in section 3.3. Optimizations and extensions are discussed in sec- 



tion 3.4, and the actual implementation of the machine, including some comparative performance 



analyses, are given in section 3.5 



3.1 The Input Language 
3.1.1 Type Specification 

A program (i.e., a grammar) must specify the type hierarchy and the appropriateness specification. 



We adopt ALE's syntax (Carpenter, 1992a) for this specification: it is a sequence of statements of 
the form: 



t sub [ti,t2,. ■ ■ , tn] intro [/i 



I fn 



where ti, . . . , i„, ri, . . . , are types, /i, . . . , are features and n, m > 0. If to = 0, the 'intro' 
part is omitted. This statement, which is said to characterize t, means that ti, . . . ,t„ are all - 
and the only - (immediate) subtypes of t (i.e., for every i,l < i < n,t Q U), and that t has the 
features /i, . . . , /,„ appropriate for it. Moreover, these features are introducedhy t, i.e., they are not 
appropriate for any type t' such that t' d t. Finally, the statement specifies that Approplt, fi) — ri 
for every i. Each type (except T and _L) must be characterized by exactly one statement. The 
arity of a type i, Ar{t), is the number of features appropriate for it. 

The full subsumption relation is the reflexive transitive closure of the immediate relation de- 
termined by the characterization statements. If this relation is not a bounded complete partial 
order, the specification is rendered invalid. The same is true in case it is not an appropriateness 
specification. 

We use the type hierarchy in figure 3.1 as a running example, where bot stands for _L. The 



type T is systematically omitted from type specifications. 



40 



bot sub [g,d] . 

g sub [a,b] intro [f3:d]. 
a sub [c] intro [fl:bot]. 

c sub [] intro [f4:bot]. 
b sub [c,e] intro [f2:bot] 
e sub [] . 
d sub [dl,d2] . 
dl sub [] . 
d2 sub [] . 



[f4:bot] 



a[fl:bot] b[f2:bot] dl d2 



;[f3:d] 



bot 



Figure 3.1: An example type hierarchy 



3.1.2 Rules and Grammars 

A grammar consists of a non-empty set of rules and a set of lexical entries which associate a feature 
structure with every word. Each rule is a sequence of feature structures of length greater than 1, 
with possible reentrancies among its elements, and a designated (last) element that is the rule's 
head. The rest of the elements in a rule form its body. We use multi-terms for representing rules 
and lexical entries; however, we employ a simple description language in which such terms are 



expressible, compatible to the ALE input language (Carpenter, 1992a). 

ale's description langua ge for feature structures is based on a logical language developed 
by Kasper and Rounds (1986), extended to accommodate types, where path sharing is replaced by 
the notion of variables due to Smolka (198^ ). A TFS is described as a conjunction of specifications 



that might include its type, its features, along with their values, or a variable (whose name begins 
with a capitalized letter) that refers to it. Multiple occurrences of the same variable denote 
reentrancy. The syntax for specifying rules consists of a rule name, the reserved word 'rule', a 
description for the rule's head, the reserved symbol '===>' and then the elements of the rule's 
body, each preceded by the reserved word 'cat>'. Lexical items consist of the word itself, the 

symbol ' >' and a description. ALE's descriptions can include several other featur es that are 

not supported by AA4ACIA, most notably disjunctions and inequations. Refer to Carpenter 
) for a formal definition of ALE's input language. 



1992a 



As an example, the specification of the example grammar of figure 2.5 is depicted in figure 3.2 
below. Notice that ALE's syntax places the head of rules before the body. Comments are preceeded 
by '"/.'. 



3.2 A TFS Unification Engine 

3.2.1 First-Order Terms vs. Feature Structures 

While TFSs resemble first-order terms (FOTs) in many aspects, it is important to note the dif- 
ferences between them. Most importantly, while FOTs are essentially trees, with possibly shared 
leaves, TFSs are directed graphs, within which variables can occur anywhere. Moreover, our sys- 
tem doesn't rule out cyclic structures, so that infinite terms can be represented, too. Two FOTs 
are mutually consistent only if they have the same functor and the same arity. TFSs, on the 
other hand, can be unified even if their types differ (as long as they have a non-degenerate LUB). 
Moreover, their arity can differ, and the arity of the unification result can be greater than that of 



41 



yo/o/o********************** Grammar Rules 
"/.grammar 

s_np_vp rule 

(phrase , cat : s , agr : Agr , sem : (Sem , argl : Sub j Sem) ) 

cat> (cat: (n , case : nom) ,agr:Agr,sein:pred:SubjSem) , 
cat> ( cat :v, agr: Agr, sem: Sem) . 

np_v_np rule 

(phrase , cat : v , agr : Agr , sem : Sem , sem : arg2 : Ob j Sem) 

cat> ( cat :v, agr: Agr, sem: Sem) , 

cat> (cat: (n, case : acc) ,sem:pred:ObjSem) . 

%%%********************** Lexical Entries 
"/.lexicon 

John > 

(word , cat : n , agr : (per : third , num : sg) , sem : pred : j ohn) . 
her > 

(word, cat: (n, case: acc) ,agr: (per : third, num: sg) , sem: pred: she) . 
loves > 

(word, cat :v, agr: (per: third, num :sg) , sem: pred: love) . 

Figure 3.2: An example grammar in ALE format 

any of the unificands. Consequently, many diversions from the original WAM were necessary in 
our design. In the following sections we try to emphasize the points where such diversions were 
made. We assume familiarity with basic WAM concepts in this section. 

3.2.2 Processing Scheme 

AAdACIA^s engine is designed for unifying two TFSs: a program and a query. Many queries 
(representing input Natural Language phrases) can be executed with respect to a given program 
(the grammar of the Natural Language). The program is compiled only once to produce machine 
instructions. Each query is compiled before its execution; the resulting code is executed prior to 
the execution of the compiled program. Execution of the instructions, produced by compiling a 
query, builds a graph representation of the feature structure denoted by the query in the machine's 
memory. The processing of a program produces code that, during run-time, unifies the feature 
structure denoted by the program with a query already resident in memory. The result of the 
unification is a new TFS, represented as a graph in the machine's memory. In what follows we 
interleave the description of the machine, the TFS language it is designed for and the compilation of 
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programs in this language. In section 3.3, queries are extended to sequences of TFSs, representing 
input strings, and programs are extended to sets of rules, i.e., grammars. 



3.2.3 Memory Representation of Feature Structures 



The major operation of AMACIA^s engine is feature structure unification; therefore, the major 
data structure of the machine is aimed at storing feature structures in an efficient way. Following 
the WAM, we use a global, one-dimensional array of data cells called HEAP. A global register H 
points to the top element of HEAP. Data cells are tagged: STR cells represent nodes, and store 
their types, while REF cells represent arcs, and contain the address of their targets. [] The number 
of arcs leaving a node of type t is Ar{t), fixed due to total well-typedness. Hence, we can keep the 
WAM's convention of storing all the outgoing arcs from a node consecutively following the node. 
Given a type t and a feature / that is appropriate for t, the position of the arc corresponding to 
/ (/-arc) in any TFS of type t can be statically determined; the value of / can be accessed in one 
step. This is a major difference from the approach presented in ( Ai't-Kaci and Di Cosmo, 1993| ); it 
leads to a more time-efficient system without harming the elegance of the machine design. 

Computations performed on the machine involve TFS unification, during which new structures 
are built on top of the heap and existing structures might be modified. It is important to note 
that STR cells differ from their WAM analogs in that they can be dereferenced when a type is 
becoming more specific. In such chain of REF cells leads to the dereferenced STR cell. 

Thus, if a TFS is modified, only its STR cell has to be changed in order for all pointers to it to 
'feel' the modification automatically. The use of self-referential REF cells is different, too: there 
are no real (Prolog-like) variables in our system, and such cells stand for features whose values are 
temporarily unknown. 

One cell is required for every node and arc, so for representing a graph of n nodes and m arcs, 
n + m cells are needed. Of course, during unification nodes can become more specific and a chain 
of REF cells is added to the count, but the length of such a chain is bounded by the depth of the 
type hierarchy and path compression during dereferencing cuts it occasionally. As an example, 
figure 3.3 depicts a possible heap representation of the TFS b(b(^d,\li),d). 



address: 
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tag: 


STR 


REF 


REF 


STR 


REF 


REF 


STR 


STR 


contents: 
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d 


d 



Figure 3.3: Heap representation of the TFS b(b(^d,\T\),d) 



3.2.4 Flattening Feature Structures 

Before processing a TFS, its linear representation is transformed to a set of "equations", each 
having a flat (nesting free) format, using a set of registers {Xi} that store addresses of TFSs in 
memory. A register Reg [j] is associated with each tag |j] of a normal term (recall that a term is 
normal only if all its types are tagged). The flattening algorithm is straight-forward and similar to 
the WAM's. The order of the equations correspond to a depth-first, postorder search of the term, 



where new registers are allocated for sub-terms before the sub-terms are processed. Figure 3.4 
depicts examples of the equations corresponding to two TFSs. 



third tag of heap cells, VAR, in introduced in section 3.4.1 
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Linear representation: 


Set of equations 


a(^dl,m) 


XI = a{X2,X2) 
X2 = dl 


b(bmdm),d) 


XI = b{X2,X3) 
X2 = b{X4,X4) 
X4 = d 
X'S = d 



Figure 3.4: Feature structures as sets of equations 



3.2.5 Processing of a Query 

When processing an equation of the form Xi„ ~ t{Xi-^, Xi^, . . .), representing part of a query, two 
different kinds of instructions are generated. The first is put_node t/n, Xi^, where n = Ar{t). 
Then, for every argument Xi-, an instruction of the form put_arc Xi^ , j, Xi. is generated. 
Execution of the put_node instruction creates a representation of a node of type t on top of the 
heap and stores its address in Xi^; it also increments H to leave space for the arcs of the newly 
created node. Execution of the subsequent put_arc instructions fills this space with REF cells. 

In order for put_arc to operate correctly, the registers it uses must be initialized. Since only 
putjiode sets the registers, one way of ensuring correctness is having all put_node instructions 
executed before any put_arc instruction is. Hence, the machine maintains two separate streams 
of instructions, one for put jiode and one for put_arc, and executes all elements of the first before 
moving to the other. This compilation scheme is called for by the cyclic character of TFSs: as 



explained in (Ai't-Kaci and Di Cosmo, 1993), the original single-streamed WAM scheme would fail 
on cyclic terms. 



The efl^ect of the two instructions is given in figure 3.5. We use syntax similar to that of (Ai't- 



Kaci, 1991) for describing the effect of instructions; in particular, the arguments of an instruction 



are listed succeeding its mnemonic. We use '<STR,t>' to denote an STR cell of type and '<REF, a>' 



to denote a REF cell pointing to the address a. Figure 3.6 lists the result of compiling the term 



b(bl^d^),d). When this code is executed (first the putjiode instructions, then the p ut_ arc ones) 



the resulting representation of the TFS in memory is the one shown above in figure 3.3 



put_node t,/n,Xi = 

HEAP[H] ^<STR,t>; 

Xi ^ H; 

H H + n + 1; 

put_arc Xi, offset, Xj = 

HEAP [Xi+off set] ^<REF,Xj>; 



Figure 3.5: The effect of the put instructions - put 
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put_node b/2,Xl 7. XI = b( 

put_arc X1,1,X2 7. X2, 

put_arc X1,2,X3 7. X3) 

put_node °L X2 = b( 

put_arc X2,1,X4 7. X4, 

put_arc X2,2,X4 7 X4) 

put_node d/0,X4 7 X4 = d 

put_node d/0,X3 7 X3 = d 



Figure 3.6: Compiled code for the query h(h^d^),d) 



3.2.6 Compilation of the Type Hierarchy 

One of the reasons for the efficiency of our implementation is that it performs a major part of 
the unification during compilc-time: the type unification. The WAM's equivalent of this operation 
is a simple functor and arity comparison. It is due to the nature of a typed system that this 
check has to be replaced by a more complex computation. Efficient methods were suggested for 



performing LUB computation at run time, relying on efficient encoding of types (see (Ai't-Kaci 
et al., 1989| )). We compute LUBs only once, at compile time, using a simple transitive closure 
computation. Since type unification adds information by returning the features of the unified 
type, this operation builds new structures, in our design, that reflect this addition. Moreover, the 
WAM's special register S is here replaced by a stack. S is used by the WAM to point to the next 
sub-term to be matched against, but in our design, as the arity of the two terms can differ, there 
might be a need to hold the addresses of more than one such sub-term. These addresses are stored 
in the stack (more details and an example are given below). 

When the type hierarchy is processed, the (full) subsumption relation is computed and ch ecked 



for bounde d-completeness (using a straight-forward implementation of Warshall algorithm ( War 



shall, 19621 ) ). Then, a table is generated which stores, for every two types ti,t2, the least upper 



bound t = t\VAti. In addition, this table lists also the arity of t, its features and their "origin" 



whether they are inherited from t\, ti, both or none of them. Figure |3.7| graphically depicts the 



LUB and appropriateness tables generated for the running example type hierarchy. 
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Figure 3.7: Type unification tables 
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Out of this table a series of abstract machine language functions are generated. The functions 
are arranged as a two-dimensional array called unif y_type, indexed by two types ii, i2- Each such 
function receives one parameter, the address of a TFS on the heap. Recall that the machine is 
designed to unify two feature structures, one of which is part of a program, represented as code, 
and the other, which is part of the query, already resides on the heap. Each unif y_type function 
receives the address of this second unificand as a parameter. When executed, it builds on the 
heap a skeleton for the unification result: an STR cell of the type ti Ut2, and a REF cell for each 
appropriate feature of it. 

Consider unif y_type [tl ,t2] (addr) where addr is the address of some TFS, A (of type 12), 
in memory. Let t = tiUt2, and let / be some feature appropriate for t. If / is inherited from t2 
only, the value of the REF cell in the skeleton result is simply set to point to the /-arc in A. In 
this case, a build jref i instruction is generated, where i is the position of the feature / in <2- If / 
is inherited from ti only, a self-referential REF cell is created in the result. But an indication that 
the actual value for this cell is yet to be determined must be recorded. This is done by means of 
the global stack S, every element of which is a pair <action,addr>, where action is either 'copy' 
or 'unify'. In the case we describe, the action is 'copy' and the address is that of the REF cell. 
Thus, the instruction that is generated is build_self jref . 

If / is appropriate for both ti and t2, a REF cell with the address of the /-arc in A is created, 
and a 'unify' cell is pushed onto the stack. The generated instruction is buildjref _and_unif y 
i, where i is the position of / in t. Finally, if / is introduced by t, a VAR cell is created, 
with t' — Approp{t, f) as its value, by the instruction build_var t' (VAR cells are explained 
in section 3.4.1). 

As an example, we list in figure 3.5 the resulting code for the unification the two types a and b 
of the running example. Since aUb = c, the first instruction of the function is build_str c. For 
every feature that is appropriate for c an instruction is generated according to the rules described 
above. Finally, a return instruction completes the function. 



unif y_type [a,b] (b_addr) 




build_str (c) ; 


7o since aUb ~ c 


build_self _ref ; 


7o the value of fl is yet unknown. 


build_ref (1) ; 


7o f2 is the first feature of b, 


build_ref _and_unif y (2) ; 


7o f3 is the second, and it still 




7o has to be unified with a. 


build_var (bot) ; 


7o f4 is a new structure. 


return; 





Figure 3.8: unif y_type [a,b] 

This example code is rather complex; often the code is much simpler: for example, when t2 is 
subsumed by ti, nothing has to be done. As another example, if ti is subsumed by t2, then only 
additional features of the program term have to be added to A. For each such feature, a unif y Jeat 
i instruction is generated, where i is the position of the feature. Another case is when ti and t2 
are not compatible: unif y_type [tl ,t2] returns 'fail'. This leads to a call to the function fail. 



which aborts the unification.n The effect of the type unification instructions is given in figure 3.9 



The notion of failure is elaborated in section 3.3.4; rather than aborting all operations, failure will indicate the 
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The special purpose register ADDR is used for passing the parameter; the exact details of control 
transfer mechanisms, including the effect of return, are straight-forward and won't be specified 
here. 



build_str t = 


build_self jref i — 


HEAP[H] ^<STR,t>; 


HEAP [H] ^ <REF , H> ; 


H ^ H + 1; 


pushCcopy ,H) ; 


buildjref _and_unif y i = 


H ^ H + 1; 


HEAP [H] ^<REF,ADDR+i+l>; 


build_var t = 


push(unif y ,H) ; 


HEAP[H] ^<VAR,t>; 


H ^ H + 1; 


H ^ H + 1; 


buildjref i = 


unify_feat i = 


HEAP [H] ^ <REF , ADDR+ i + 1 > ; 


push(unify,ADDR+i+l) ; 


H ^ H + 1; 





Figure 3.9: The effect of the type unification instructions 



3.2.7 Processing of a Program 

The program is stored in a special memory area, the CODE area. Unlike the WAM, in our framework 
registers that are set by the execution of a query are not helpful when processing a program. The 
reason is that there is no one-to-one correspondence between the sub-terms of the query and the 
program, as the arities of the TFSs can differ. The registers are used, but (with the exception of 
X\) their old values are not retained during execution of the program. 

Three kinds of machine instructions are generated when processing a program equation of the 
form Xi^ = t(Xij ,. . . The first one is get_structure t/n,Xi(,, where n = Ar{t). For 

each argument Xi. of t an instruction of the form unif y_variable Xi^ is generated if Xi. is 
first encountered; if it was already encountered, unif y_value Xi^ is generated. For example, the 



machine code that results from compiling the program a(\3\dl,\3\) is depicted in figure 3.10, The 



implementation of these three instructions is given in figure 3.11 



get_structure a/2, XI XI = a( 

unify_variable X2 % X2, 

unify_value X2 % X2) 

get_structure dl/0,X2 X2 = dl 



Figure 3.10: Compiled code for the program a(^dl,\3\) 

The get_structure instruction is generated for a TFS Ap (of type t) which is associated with 
a register Xi. Execution of this instruction matches Ap against a TFS Aq that resides in memory, 
using Xi as a pointer to Aq. Since Aq might have undergone some type inference (for example, due 
to previous unifications caused by other instructions), the value of Xi must first be dereferenced. 

need in baektracking to an alternative solution. 
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get_structure t/n.X; = 

addr ^deref(Xj); Xi ^— addr; 
case HEAP [addr] of 

<REF,addr>: 7, uninstantiated cell 

HEAP[H] ^<STR,t>; 

bind(addr,H) ; 7. HEAP [addr] ^ <REF,H> 

for j <- 1 to n do HEAP[H+j] ^ <REF,H+j> 
for j ^ n downto 1 do push(<copy,H+j>) ; 
H ^ H + n + 1; 
<STR,t'>: 7. a node 

if (unif y_type [t ,t '] (addr) = fail) then fail; 

unif y_variable Xi = 

<action,addr> <— popO; 
Xi <— addr; 

unify_value Xi = 

<action,addr> ^popO; 
case action of 

copy: HEAP [addr] ^ HEAP [XJ ; 

unify: if (unif y (addr ,Xi) = fail) then fail; 



Figure 3.11: Implementation of the get/unify instructions 



This is done by the function deref which foUows a chain of REF cells until one that does not point 
to another, different REF-cell, is reached. The address of this cell is the value it returns. 

The dereferenced value of Xi, addr, can either be a self-referential REF cell or an STR cell. 
In the first case, the TFS has to be built by executing the program. A new TFS is being built on 
top of the heap (using code similar to that of put_structure) with addr set to point to it. For 
every feature of this structure, a 'copy' item is pushed onto the stack. The second case, in which 
Xi points to an existing TFS of type t' , is the more interesting one. An existing TFS has to be 
unified with a new one whose type is t. Here the pre-compiled unify_type [t,t'] is invoked. 

To readers familiar with the WAM, the unif y_variable instruction resembles very much its 
WAM analog, in the read mode of the latter. There is no equivalent of the WAM's write mode 
as there are no real variables in our system. However, in unif y_value there is some similarity to 
the WAM's modes, where the 'copy' action corresponds to write mode and the 'unify' action to 
read mode. In this latter case the function unify is called, just like in the WAM. This function 
(figure 3.12| ) is based upon unify_type. In contrast to unify_type, the two TFS arguments of 
unify reside in memory, and full unification is performed. The first difference is the reason for 
removing an item from the stack S and using it as a part of the unification process; the second is 
realized by recursive calls to unify for subgraphs of the unified graphs. Notice that the function 
returns immediately if its arguments point to the same address, and binds its arguments otherwise. 
This guarantees correctness even in the face of cyclic structures. 

When a sequence of instructions that were generated for some TFS is successfully executed on 
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function unify (addrl , addr2 : address) : boolean; 
begin 

addrl ^ deref (addrl) ; addr2 ^ deref (addr2) ; 
if (addrl = addr2) then return(true) ; 
if (HEAP [addrl] = <REF,addrl>) then 

bind(addrl , addr2) ; return (true ) ; 
if (HEAP[addr2] = <REF,addr2>) then 

bind(addr2,addrl) ; return (true ) ; 
tl ^ HEAP [addrl] .type; t2 ^ HEAP [addr2] .type ; 
if (unif y_type [tl ,t2] (addr2) = fail) then return (false); 
for i ^ 1 to Ar(tl) do 

<action,addr> ^popO; 

case action of 

copy: HEAP[addr] ^ <REF,addrl+i>; 
unify: if (not (unify (addr,addrl+i))) 
then return(f alse) ; 
bind(addrl ,addr2) ; 
return(true) ; 

end; 



Figure 3.12: The code of the unify function 



some query, the result of the unification of both structures is built on the heap and every register 
Xi stores the value of its corresponding node in this graph. The stack S is empty. 



3.3 Parsing 

The previous section delineated the core engine of AM.ACIA\ this section shows how it is extended 
with control instructions to accommodate for parsing. This constitutes the major difference be- 
tween AMACIA and abstract machines that were devised for variants of Prolog: computations 
performed on AMACIA amount to parsing with respect to the input grammar, as opposed to 
SLD resolution. 



The parsing process described in section 2.£ above is a generic, abstract one: there is no 



specification of the order in which new items are computed during each application of Tg,w When 
designing the control mechanisms of the machine, several parameters have to be explicated and 
their values determined. In what follows we describe how the machine (and a compiler for it) 
are designed to allow for efficient implementation of parsing, that is, computation of the least 
fix-point of Tg^w for a given grammar G and an input string of lexical elements w of length n. 
The control modules of AMACTA are motivated by the abstract process of chart parsing and are 
not, in general, inspired by the specific TFS-based formalism that we deal with. For example, the 
machine can also be used for parsing with respect to "plain" context-free grammars. 



Notice that cases 2.3, 2.4 and 2.5 of Tg.w are independent of the argument / and add the same 



items in every application of the operator. Therefore, when computing the least fix-point of the 



operator, they are computed only once, when the process is initiated. Cases 2.1 and 2.2 are more 
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interesting. We treat completion as a special case of dot movement where the dot is moved from 
the penultimate position to the final one, so that completion can take place immediately after 
the final application of dot movement. Dot movement creates an item on the basis of two items 
in /, an active one [I, A, m, Act] and a complete one [m, i?, r, Comp], where I < m < r.^ Since 
for every off-line parsable grammar the number of items that span a particular substring of the 
input is finite, it is possible first to generate the items spanning (m,r), for all m > I, and all the 
items spanning (l,m), where m < r, and only then the items spanning This is the invariant 

underlying our design. When generating items that span {l,r), the active items that span {l,m) 
are combined with complete items that span (m, r), where m decreases from r — 1 to We use a 
chart to store generated items, since they may be used more than once. 



3.3.1 A Parsing Algorithm 

A chart of size n is a data structure that can be accessed by a key that is a triple (Z,m,r), where 
l<r<n, 0<Z<r — 1 and / < m < r — 1. Given these restrictions, a chart of size n can be 

accessed by X;"=i E[=o E^ri; 1 = E"=i E[=o - = Er=i ^('^ - l)/2 = {n^ ~ different 
keys. Keys are linearly ordered as follows: {l,m,r) ~< {l',m',r') iff r < r' or ((r — r') and {I > I')) 
or ((r = r') and (/ — I') and (to < to')). Each element of the chart is a pair of chart entries. Such 
pairs are accessed by the coordinates of the key: the element indexed by {I, to, r) is a pair indexed 
by {{l,m), (rn,r)). Additionally, if two elements have matching sub-keys then it is required that 
the corresponding elements of the pairs be identical: the element indexed by {l,m) in (l,m,r) 
must be identical to the element indexed by (Z, m) in {I, to, r'). Therefore, even though there exist 
{n^ — n)/6 different keys by which 2 x {n^ — n)/6 chart entries can be accessed, there are only 
n{n + l)/2 different chart entries. 

Each chart entry contains two lists of edges: active and complete. Each list, both active and 
complete ones, is a sequence of edges along with a specified current edge, on which the following 
operations are defined: 

new(list): return an empty list; 

add_edge (list , edge) : add edge to the end of list; 

init(list): set the current edge of list to be the first edge, if there is one; 
advance (list) : set the current edge in list to be the next edge, if there is one. 
current (list) : return the current edge of list; 

exhausted(list) : return true iff the current element in list is undefined; 

An edge can be either active or complete. An active edge stems from some grammar rule; it 
contains a part that was already scanned, and a part that is left to be seen. The position between 
the two parts is indicated by the location of the dot. A complete edge is a result of scanning an 
entire body of some rule, and constructing the rule's head. 



The parsing process is outlined informally in figure 3.13: (a) shows the order in which chart 
entries are constructed; (b) shows the order in which chart entries are scanned to construct the 
[left, right] entry; combination of chart entries is performed as described in (c). The heart of 
the process is dot movement, which creates a new edge e by unifying the TFS that immediately 
succeeds the dot in an active edge el with some complete edge e2. 

''We use I, m, r for left, mid and right, respectively. 
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a. Building chart entries 



Procedure main 

for right ^ to n do 

for lef t-1 ^ right downto do 

build_chart_entry (left, right) 



b. Constructing one chart entry 



Procedure build_chart_entry (left, right) 
for mid «— right-1 downto left do 
combine (left , mid, right) 



c. Combining two entries 

Procedure combine (left , mid, right) 

for every active edge el in chart [lef t , mid] 

for every complete edge e2 in chart [mid, right] 
e ^ dotjnovement (el , e2) 

chart [lef t , right] ^ chart [lef t , right] U {e} 

Figure 3.13: Parsing - informal description 

The last part of the process requires some precaution: application of dot movement to an active 
edge in (l,m) and a complete one in (jn^r) results in a new edge in (Z,r). This edge might be 
complete; in the special case where I = m, the complete edge that is thus created is added to 
{l,r) = (m, r). Such edges can now be combined with active edges in (I, m) again. Notice that the 
situation occurs only when I = m. The only way an active edge in the (I, I) entry of the chart (that 
is, an edge with the dot in the initial position) can become complete is if the rule on which the 
edge is based is of length 2 (a unit rule). Therefore, for unit rules a special treatment is required: 
first, active edges that originate from unit rules are placed before edges that stem from other rules 
within the same chart entry. This guarantees that complete edges that result from application 
of dot movement on (an active edge that stems from) a unit rule are constructed before they are 
needed. The only problem left is the possibility of more than one unit edge in the same chart 
entry. Such rules are required to be ordered by the grammar writer, such that if pi can "feed" 
Pi precedes p2 in the grammar.^ 

The advantage of this parsing algorithm is that it is simple; in particular, it does not require an 
agenda. In the above description we assumed that the chart is initialized with complete edges for 
the lexical entries of the input words, and with active edges - with the dot in the initial position 

^Such an ordering must always exist for off-line parsable grammars. 
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- for the rules. 

Following this informal description, several observations about the parsing process can be made: 

• After the edges in the [I, r] chart entry are constructed, no more edges will be added to entries 
that were constructed earlier; 

• When complete edges of column c are used, no complete edges of columns 1, . . . , c — 1 will 
be used any more. Consequently, at every point during the process, the complete edges in 
chart entries whose second index is less than r will not be used any more; 

• active edges may be used over and over again. 

These observations guide the specification of the machine architecture, which is divided into data 
structures and machine instructions. 



3.3.2 Data Structures 

The chart is represented as a two-dimensional array, indexed by two integers, where chart [l,r] 
refers to a chart entry. Each chart entry contains two lists of edges, which are referred to as 
chart [l,r] .active and chart [l,r] .complete. 

A complete edge is represented in memory as a structure e containing an address, e.addr, of 
some HEAP cell that is the root of a feature structure. An active edge is represented as a structure 
containing a pointer, e. label, and a set of register values, e.regs. An active edge always stems 
from some grammar rule; it records a major part of the state of the machine at some given time, 
e . label is the address (in CODE) of the first instruction in the code that is generated for the TFS 
immediately following the dot of the edge; e . regs records the values of the machine registers. The 
registers represent the part of the edge prior to the dot, whereas the pointer represents the part 
following it.0 During a computation of AMACIA, edges are repeatedly stored in the chart and 
loaded from it. When an edge e is loaded, e. label is used to determine the next instruction 
that is to be executed; this implies an implicit branch whenever an active edge is loaded (see the 
effect of the call instruction below). The auxiliary function make_edge creates an edge from its 
components (an address and a set of values for the registers). 

Special purpose registers record the current values of the chart indices LEFT , RIGHT and MID and 
the input length LEN. Like all the machine's data structures, the control structures are initialized 
before every execution of a program: the registers LEFT , MID , RIGHT and LEN are set to and new 
is applied to every chart entry. The control structures are affected by the execution of machine 
instructions that are generated for both the program and the query, as explained below. 



3.3.3 Compilation 

Compilation of a grammar produces code that, when executed, realizes the parsing process with 
respect to the grammar and an input query (that is a sequence of TFSs) representing the Natural 
Language input. In this section we describe the compilation scheme in terms of the resulting code. 
First, the chart is initialized. Three sets of edges have to be inserted to the chart, in corre- 



spondence to the three cases of Tq^w (definition 2.9.2) that are independent of the input: complete 



edges for lexical items, active edges (with the dot in the initial position) for rules and complete 
edges for e-rules. 

^The machine's stack is always empty after the execution of a code that was generated for a feature structure. 
Therefore, the stack docs not have to be included in the state. 
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The first set, corresponding to case ^.B] of Tq^w, is inserted to the chart by processing the query. 
Recall that processing a query results in the generation of machine code that is executed prior 
to the execution of the program code. When the query is composed of more than one feature 
structure, the generated code contains proceed instructions that are inserted after every feature 
structure in the query. The effect of proceed is given in figure 3.14; the parameter Xi is the 



register that points to the root of the feature structure that was just created (that is, the first 
register mentioned in the code that immediately precedes the proceed instruction). Processing 



proceed Xi = 

LEW ^ LEN + 1 ; 
add 



<— L.£,i\i -I- 1 ; 

.edge (make_edge(Xi, null) , chart [LEN-1,LEN] .complete) ; 



Figure 3.14: The effect of the proceed instruction 

diagonal of the chart; in addition, the 



of the query results in edges that are added to the [i — l,i 
value of LEN is set to the length of the input. 



The second set, corresponding to case 2.3, is added by means of specific machine instructions. 



put_rule 1, that are generated for each of the rules in the input grammar, where I is the label 
of the first instruction of the compiled rule. put_rule adds an active edge for the rule starting 
at address L, with no registers bindings, to the entries of the chart (figure ^3.15 ). Those 
instructions are placed by the compiler prior to any other instruction of the program. 



putjrule L — 

for to LEN do 

add_edge(inake_edge(L,null) , chart[i,i] .active) ; 



Figure 3.15: The effect of put_rule 



Rules of length one (e-rules, or empty categories) , cor respon ding to case 2.4, are processed by 
the compiler in a special way, that is described in section |3.4.3 . 



Th e m ain product of the compiler is the code that corresponds to dot movement and completion 
and O) 



(cases 2A and ^.21). For a rule of the form: Ai, A2, . . . , An Aq the compiler generates the code 
given in figure 3.16 , where ri is the index of the first register that is mentioned in the code for A^, 
and where X is the first register mentioned in the code for Aq. The effect of this code is discussed 



in section 3.3.4 



Controlling the order in which chart entries are constructed is independent of the grammar. 
Consequently, the compiler generates a few pieces of identical code for every grammar. On the 
basis of the generated code for one rule, the code for a grammar consisting of k rules is given in 
figure 3.17 . 

An important observation regarding the control flow of a compiled program has to be made 
here. In general it is impossible to determine, during compile time, the order in which machine 
instructions will be executed. Indeed, some control instructions (in particular, the key manipu- 
lation instructions) resemble ordinary conditional branches. However, other instructions (call, 
copy_active_edge and copy_complete_edge) 'hide' implicit branches to addresses that are only 
known at run time. Further details are given below. 
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putjrule Li 


; add 


active edges 


to the main diagonal 




loadJs ri 










[ (program) 


code 


for Ai ] 






copy_active_edge 










loadJs Ti 


; 


^ ADDR 






[ (program) 
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for A, ] 






copy_active_edge 
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[ (program) 
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for An ] 






[ (query) code for 1 






copy_complete_edge 


X 







Figure 3.16: Compiled rule 



To conclude this section, figure 3.18 



depicts (part of) the code that was generated by the com- 



piler for the example grammar of figure 3.2. Lines 2-11 contain the constant, grammar independent 



code; lines 12-49 list the code that was generated for the first rule; and the code on the right side 
was generated for the lexical entries of the words "John" and "love" . 



3.3.4 Effect of the Machine Instructions 

This section details the effect of the control module machine instructions. While the effect of each 
instruction is given independently of its context, we assume throughout the description that a 
sequence of instructions is present that were generated by the compiler for some grammar. 

The instructions f irst_key, nextJcey and checkJiey, constantly generated for every grammar, 
are aimed at implementing the outermost control flow during parsing. Motivated by the invariant 
stated in section 3.3.1 above, these instructions form a loop that causes AM.ACIA to build all 
the necessary chart entries in the order specified there. The body of the loop contains code whose 
effect corresponds to the procedure Combine of figure 3.13. The effect of the key-manipulation 



instructions is depicted in figure 3.19 



Every iteration of the main loop corresponds to the combination of two chart entries, taking 
the active edges from the entry indexed by the values of LEFT and MID and the complete edges 
from the entry indexed by the values of MID and RIGHT. To loop over all the active edges (in 
the designated chart entry), two instructions are used: tst_active_edges and next_active_edge. 
The instructions tst_complete_edges and next_complete_edge scan the complete edges in the 
chart entry that is indexed by the values of MID and RIGHT. 

The effect of these four instructions is given in figure p. 20 , The two instructions that loop over 
the active edges are straight-forward; the other two are quite similar, with a few differences. First, 
tst_complete_edges initializes the list of complete edges in the [MID, RIGHT] chart entry. Second, 
next_compIete_edge calls the auxiliary function reset_trail, whose purpose will be discussed 
presently. 
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put_rule Li 




putjrule Lk 




f irst_key 


Lstart • 


next_key 


Lact • 


tst_active_edges Lend 


Lconip • 


tst_complete_edges icomp 




load_machine_state 




call 




next_complete_edge Lcomp 


T I 

comp ' 


next_active_edges Lact 


Lend • 


check_key Lstart 




end_program 


Li: 


code for 1st rule 


Lk-. 


code for fc-th rule 



Figure 3.17: Compiled code for a grammar with k rules 



call (figure 3.21) sets the stage for the operation of dot movement. Let el be the current active 
edge in the [LEFT, MID] entry of the chart, end e2 - the current complete edge in the [MID, RIGHT] 
entry. The address in the code area of the next instruction to be executed is stored in el . label; 
this code is to be executed on the feature structure pointed to by e2 . addr. In other words, the code 
that was generated for the next element of el, viewed as a procedure call, is to be executed on the 
complete edge e2, whose address is viewed as a parameter, call loads the registers' values from 
el, and saves e2. label in the special purpose register ADDR, which is used for passing parameters. 
Then, it saves the address to return to (that is, the address of the instruction following the call) 
in a special purpose stack of return addresses and branches to el . label 

The first instruction that is executed in a 'procedure' is loadJs X, which loads the value 
stored in ADDR onto the register X. Then, the instructions that were generated for this part of the 
rule are executed in order, thus unifying the TFS immediately after the dot (in the active edge) 
with the TFS that Xr points to. If the unification succeeds, control flows to the copy_active_edge 
instruction that adds the newly created MRS to the chart, and returns the control to the address 
stored in the stack of return addresses. If the entire body of the rule was consumed, the last 
instruction is copy_complete_edge (see figure B.16), which adds the newly created complete edge 
to the chart and returns. The auxiliary function copy_mrs copies the MRS accessible from the 
current registers on top of the heap. copyJs X copies the feature structure rooted in X on top 



of the heap. The effect of these three instructions is depicted in figure 3.22 



When the code that is associated with some program feature structure is executed, the heap 
is modified. Sometimes the same code has to be executed on several TFSs (since one active edge 
might be combined with several complete ones). If the unification fails, that is, fail is called, the 
heap must be restored to its original form. To this end a new data structure is introduced: the trail. 
It is an array whose contents are pairs of the form <address,value>, which record modifications 



55 








put_rule L6 CI) 







L91 


: put_node word, XI 


1 




put_rule L8 (2) 




1 




put_node n, X2 


2 




f irst_key 




2 




put_node case, X5 


3 


LI 


: next_key 




3 




put_node agr, X3 


4 


L2 


: tst_active_edges L5 




4 




put_node third, X6 


5 


L3 


: tst_complete_edges L4 




5 




put_node sg, XT 


6 




call 




6 




put_node sem, X4 


7 




next_complete_edge L3 




7 




put_node John, X8 


8 


L4 


: next_active_edge L2 




8 




put_node atom, X9 


9 


L5 


: check_key LI 




9 




put_node atom, XIO 


10 




end_of .program 




10 




put.arc X1,1,X2 


11 


L6 


: load_fs XI 




11 




put_arc X1,2,X3 


12 




get_structure sign 


XI 


12 




put_arc X1,3,X4 


13 




unif y_variable X2 




13 




put_arc X2,1,X5 


14 




unif y_variable X3 




14 




put_arc X3,1,X6 


15 




unif y_variable X4 




15 




put_arc X3,2,X7 


16 




get_structure n, X2 


16 




put_arc X4,1,X8 


17 




unif y_variable X5 




17 




put.arc X4,2,X9 


18 




get_structure nom, 


X5 


18 




put.arc X4,3,X10 


19 




get_structure agr, 


X3 


19 




proceed XI 


20 




unif y_variable X6 










21 




unif y_variable X7 




40 


L93 


: put_node word, XI 


22 




get_structure per, 


X6 


41 




put_node v, X2 


23 




get_structure num, 


X7 


42 




put_node agr, X3 


24 




get_structure sem, 


X4 


43 




put_node third, X5 


25 




unif y_variable X8 




44 




put_node sg, X6 


26 




unif y_variable X9 




45 




put_node sem, X4 


27 




unif y_variable XIO 




46 




put_node love , X7 


28 




get_structure atom 


X8 


47 




put_node atom, X8 


29 




get_structure atom 


X9 


48 




put_node atom, X9 


30 




get_structure atom 


XIO 


49 




put_arc X1,1,X2 


31 




copy_active_edge L7 




50 




put_arc X1,2,X3 


32 


L7 


: load_fs Xll 




51 




put_arc X1,3,X4 


33 




get_structure sign 


Xll 


52 




put_arc X3,1,X5 


34 




unif y_variable X12 




53 




put_arc X3,2,X6 


35 




unify_value X3 




54 




put_arc a4 , 1 , a f 


36 




unif y_variable X13 




55 




put.arc X4,2,X8 


37 




get_structure v, X12 


56 




put_arc X4,3,X9 


38 




get_structure sem, 


X13 


57 




proceed XI 


39 




unif y_variable X14 










40 




unify_value X8 










41 




unif y_variable X15 










42 




get_structure atom 


X14 








43 




get_structure atom 


X15 








44 




put_node phrase, X16 








45 




put_node s, X17 










46 




put_arc X16,1,X17 










47 




put_arc X16,2,X3 










48 




put.arc X16,3,X13 










49 




copy_complete_edge 16 











Figure 3.18: Compiled code obtained for the example grammar 



to HEAP cells. Pairs are being added to the trail by means of the bind function, whenever the 
value of a heap cell is modified. If all the unifications are successful, and control flows naturally to 
next_complete_edge, the trail is reset using the auxiliary function reset_trail. 

Consider now the case where some unification fails. The effect of the fail function has to be 
modified: failure of a "local" unification no longer means termination of the program; rather, it 
indicates the need to try different edges to combine. Failure can be detected during the execution 
of any of the instructions in the program code. In this case, the previous bindings are undone, 
using a call to unwind_trail, and the stack is initialized using reset_stack. Then, a branch is 
made to the last tst_complete_edges instruction executed. This instruction's address is stored in 
the special purpose register RETURNJVDDR. The definition of fail is given in figure [3.23 
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first_key = 
RIGHT ^ 0; 
LEFT <- -1; 
MID <- -1; 

next_key = 

MID ^ MID-1; 
if (MID < LEFT) then 
LEFT ^ LEFT - 1 ; 
if (LEFT < 0) then 

RIGHT RIGHT + 1 ; 
LEFT ^ RIGHT - 1 ; 
MID ^ RIGHT - 1; 
init (chart [LEFT, MID] .active) ; 
init (chart [MID, RIGHT] .complete) ; 

checkJjey I = 

if (RIGHT ^ LEN or LEFT 7^ or MID ^ LEFT) then 
branch /; 



Figure 3.19: The effect of the key-manipulation instructions 



tst_active_edges I = 

if exhausted(chart [LEFT, MID] . active) then 
branch I; 

next_active_ed.ge / = 

advance (chart [LEFT, MID] .active) 
branch / 

I: tst_coinplete_edges I' = 

if exhausted(chart [MID, RIGHT] . complete) then 
init (chart [MID, RIGHT] .complete) ; 
branch I'; 

next_compIete_edge I = 
reset_trail; 

advance(chart [MID, RIGHT] .complete) ; 
breinch I; 



Figure 3.20: The effect of the edge traversal instructions 
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I: call = 

registers ^ current (chart [LEFT, MID] .active) .regs; 
ADDR ^ current (chart [MID, RIGHT] .complete) .addr; 
pushj:eturn_addr (Z + 1) ; 

branch current (chart [LEFT, MID] . active) . label ; 



Figure 3.21: The effect of the call instruction 



load_fs r = 

Xr ^ ADDR; 

copy_active_edge I = 
copyjnrs ; 

add_edge(make_edge(Z, registers) , chart [LEFT, RIGHT] .active) ; 
branch popjreturn_address () ; 

copy_complete_edge X = 
copy_f s (X) ; 

add_edge(make_edge(X,null) , chart [LEFT, RIGHT] .complete) ; 
branch popjreturn_address () ; 



Figure 3.22: The effect of the copy instructions 



The WAM use s a trail to und o 'side effects' on the stack and the heap upon backtracking to a 
choice point (see ( Ai't-Kaci, 1991 , chapter 4.2)). In AAiACTA no backtracking is performed and 
so the trail could have been eliminated. Notice that after execution of program code, the newly 
created edge (whether active or complete) is copied onto the heap. A different strategy could have 
been chosen, in which the active edge is copied prior to the execution of the program code. In this 
case, all that has to be done upon failure is restoring the value of the heap pointer H, so that the 
cells that were used by (ineffective) instructions can be re-used. While the gain in this strategy is 
that no trail is needed, it doesn't seem to be too effective: active edges would have to be copied 
before they are used, which means that many MRSs will be copied in vain. Since copying is one 
of the most time-consuming operations, we opt for the method described above. 



3.4 Optimizations and Extensions 
3.4.1 Lazy Evaluation of Feature Structures 

One of the drawbacks of maintaining total structures is that when two TFSs are unified, the values 
of features that are introduced by the unified type have to be built. For example, unif y_type [a,b] 
(figure |3.8| ) has to build a TFS of type hot, which is the value of the /4 feature of type c. This 



58 



procedure fail; 
unwind_trail ; 
reset_stack; 

branch pop_return_address () ; 



Figure 3.23: The fail function 



is expensive in terms of both space and time; the newly built structure might not be used at all. 
Therefore, it makes sense to defer it. 

To optimize the design in this aspect, a new kind of heap cells, VAR-cells, is introduced. A 
VAR cell whose contents is a type t stands for the most general TFS of type t. VAR cells are 
generated by the various unify_type functions for introduced features; they are expanded only 
when the explicit values of such features are needed: either during the execution of get_structure, 
where the dereferenced value is a VAR cell, or during unif y.^ In both cases the TFS has to be 
built, by means of executing the pre-compiled function build_inost_generaUs with the contents 
of the VAR cell as an argument. This function (which is automatically generated by the type 
hierarchy compiler) builds a TFS of the designated type on the heap, with VAR cells instead of 
REF cells for the features. These cells will, again, only be expanded when needed. We thus obtain 



a lazy evaluation of TFSs that weakly resembles Gotz's notion of unfilled feature structures ((Gotz 



1994)). Moreover, we gain another important property, namely that our type hierarchies can now 
contain loops, since appropriateness loops can only cause non termination when introduced features 
are fully constructed. This approach might not be applicable in the presence of type constraints, 
which are currently not supported by AM ACTA. 

3.4.2 Partial Descriptions 

AAiACTA requires that its input be total: both grammar rules and lexical entries are required 
to consist of totally well-typed feature structures. This requirement might be problematic for 
the grammar writer, who might prefer to specify only partial information. To this end, the com- 
piler employs a pre-processor that performs type inference on the partial input; the result of this 
processing is almost total, but partiality is maintained in certain cases. 

Recall that a normal term consists of a tag, a type and a sequence of arguments, each of which 
is a normal term. Whenever some sub-term is the most general term of its type, it is substituted by 



the type name only. Using the running example of figure 3T, the term a{bot,d{)) can be replaced 
by the term a. 

When the compiler encounters such a partial description, it creates one of the following two 
instructions: put_var t/n, Xi, if the type t is part of a query code, or get_var t/n, Xi, if it 
is part of a program code. put_var is very similar to put_node, with two differences: it creates 
a VAR-cell, rather than an STR-cell, on the heap; and it does not leave space for REF-cells, 
as there won't be any. get_var is the analog of get_structure, but is much simpler: it uses 
buildjnost_generaUs to create the most general feature structure of type t on top of the heap, 
and then calls unify to unify this newly created TFS with the one that is pointed to by Xi. Thus, 

'^The effect of get_structure and the definition of unify arc modified in a straight-forward way to accommodate 
VAR cells. 
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partial descriptions in the input result in a more efficient code, and consequently in a faster, more 
space-economic processing. 



3.4.3 Empty Categories 

The presence of empty categories (e-rules) in a grammar causes both theoretical and practical 
problems. There is a current trend in HPSG of avoiding empty categories altogether, due to 
theoretical linguistic and cognitive reasons (see, e.g., ( ^ag and Fodor, 1994 )). From a computational 
point of view, such categories always cause considerable efficiency degradation. 

AA4ACTA is designed to support empty categories as an inherent part of the input grammars. 
Empty categories are processed by the compiler at compile time. Each category is matched against 
every element i in the body of every rule r, and if the unification succeeds, a new rule is created: 
this rule consists of r, modified by the effects of the unification, in which the i-th element is 
removed. This process can be shown to yield an equivalent grammar, if it terminates. 

However, for certain grammars, the process will never terminate, since it can lead to the creation 
of new empty categories (when it is applied to rules with just one element in their bodies). A typical 
example would be the rule 

riist 



H [list] 



hd : 
tl : 



[a] 



When it is applied to the empty category [elist] , a new empty category is created: 



list 

hd : 
tl : 



elist 



This new empty category can, in turn, be unified with the head of the rule, etc. To eliminate 
such infinite loops and to maintain efficiency even in face of empty categories, the compiler limits 
their application: new rules, that were obtained by applying some empty category to an original 
grammar rule, can not by applied to other empty categories. This implies that a single grammar 
rule cannot derive two empty categories. Since usually empty categories are designed to operate 
in a very limited context, this seems to be a reasonable compromise. 



3.4.4 Lexical Ambiguity 

The lexicon associates every word w with a set of feature structures Cat{w). If this set contains 
more than one element, w is said to be ambiguous. AM.ACIA processes the lexicon at compile 
time: to every input word Wi the lexicon assigns a normal term, which is transformed to machine 
instructions. If Wi is ambiguous the lexicon assigns it several normal terms. The code that is 
generated for these terms is regular query code; however, the instruction that separates the code 
of one term from the code of another, if both are associated with the same word, is same_word 
instead of proceed. The only difference between the two instructions is that the former does not 
increment the value of the special purpose register LEN. At run time, proceed causes the machine 
to search for the next lexical entry, whereas scune.word does not. Thus, the execution of the code 
that was generated for an ambiguous lexical entry w causes several complete edges to be inserted 
into the chart, one for each element of Cat{w). 



60 



3.4.5 Functional Attachments 



While the phrase structure grammar organization underlying our design is usually appropriate for 
constructing grammars for natural languages, there is sometimes need in computations that are 
not easily expressed using the formalism. Although contemporary grammatical formalisms tend 
to be highly declarative in nature, grammar writers might find it useful to resort to some mecha- 
nism that enables simple computations to be executed without the full power of the grammatical 
formalism. ALE supports this need to the fullest, by incorporating a complete system of definite 
clause attachments to grammar rules. Basically, this is a version of a Prolog-like programming 
language, where the basic units are TFSs rather than FOTs. 

AAiACXA does not include such a module. As a limited solution, we implemented a small 
set of functions that can be used by the grammar writer; these functions are executed during the 
parsing process and their results might be integrated with the parsing. 

As an example, consider the pre-defined function append. It receives two parameters, which 
must be lists of TFSs, and returns a list consisting of their concatenation. The grammar writer 
can use append by integrating it in the grammar: following the body of any rule a goal of the form 
'goal> append (LI, L2,L3)' can be placed. The variables LI and L2 must be associated with lists, 
and after the goal is executed, the variable L3 will be bound to the concatenation of the input lists. 
Now, L3 can be used in the head of the rule. 

Since parsing is performed bottom-up, goals can only be placed after all the elements of the 
body of a rule; their input parameters must be instantiated, and the output parameter can only be 
used in the head of the rule. Currently, only a small number of functions (mainly for handling lists 
and sets) are integrated into AAiACTA, but more can be easily added. It must be noted, though, 
that this situation is very different from ALE, in which the user can define just any definite clause 
relation. 



3.5 Implementation 

AA4ACIA is implemented as a complete grammar development system, containing a compiler 
from the ALE input language to the abstract machine language, an interpreter for the machine 
instructions, a simple debugger for the machine language and a graphical user interface (GUI) that 
eases the process of grammar design and debugging. The major part of the software is written 
in C; the compiler is written using yacc and lex, and the graphical user interface is implemented 



using Tcl/Tk (Ousterhout, 1994). The system is implemented on a Sun Sparc station under the 
Solaris operating system. 

The system was tested with a wide variety of grammars, mostly adaptations of existing ALE 
grammars. It is important to note that AA4ACIA does not provide the wealth of input specifica- 
tions ALE does. Some of ALE's features that are not included in AM.ACIA include lexical rules, 
free use of definite clause attachments and disjunctive descriptions. On the other hand, develop- 
ment of grammars in AM. ACT A is made easier due to the GUI and its improved performance over 
ALE. A complete description of AM.ACIA's implementation, its deviations from ALE's input 



language and a complete users' guide, is given in [Wintner, Gabrilovich, and Francez (19971) 



To compare AM.ACIA with ALE we have used a few benchmark grammars. Both systems 
were used to compile the same grammar and to parse the same strings. We shortly describe below 
each of the grammars, and summarize the results of a performance comparison of AA4ACTA and 



ALE in figure 3.24. All times are in seconds; in ALE we measured the time for the first result to 



be displayed, and in AM.ACTA - the time for all the results. 
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The first grammar is an early version of the HPSG-based Hebrew grammar described in the 
next chapter. It consists of 4 rules and one empty category; the type hierarchy contains 84 
types and 32 features, and the lexicon contains 13 words. The second grammar is an HPSG- 
based grammar for a subset (emphasizing relative clauses) of the Russian language, developed by 
Evgeniy Gabrilovich and Arkady Estrin. It consists of 8 rules and 76 lexical entries; the type 
hierarchy contains 151 types and 31 features. The third example is a simple grammar generating 
the language {a"?)" | n > 0}. While the execution times for this simple grammar are less important, 
the differences in compilation time indicate a major advantage in using AMACIA for instructional 
purposes; in such cases grammars are compiled over and over again, while they are usually executed 
only a few times. 



task 


ALE 


AMACIA 


Grammar 1 


Coiiiijilatiou 


:-!5.() 


1.4 


Parsing, 6 words, 2 results 


0.5 


0.5 


Parsing, 10 words, 8 results 


3.2 


0.8 


Parsing, 14 words, 125 results 


140.0 


9.0 


Grammar 2 


Compilation 


68.0 


2.3 


Parsing, 2 words, 2 results 


0.5 


0.8 


Parsing, 4 words, 2 results 


2.4 


0.9 


Parsing, 7 words, 2 results 


5.1 


1.1 


Parsing, 8 words, 2 results 


7.8 


1.2 


Parsing, 12 words, 2 results 


17.0 


1.5 


Grammar 3 


Compilation 


6.5 


0.2 


Parsing, n=4 


0.1 


0.2 


Parsing, n=8 


0.8 


0.3 


Parsing, n=16 


2.8 


1.1 


Parsing, n=32 


26.0 


16.0 



Figure 3.24: Performance comparison of AMACIA and ALE 
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Chapter 4 

An HPSG-based Grammar for 
Hebrew 



In order to test the validity of the abstract machine and its appropriateness for designing HPSG- 
based grammars, we have devised a small-scale grammar for a fragment of the Hebrew language, 
based upon the principles of HPSG as stipulated in Pollard and Sag (1994). It must be emphasized 
that the main objective of the grammar design was to verify the machine, and therefore its linguistic 
contributions are minor. Still, it might serve as the starting point for the construction of a larger 
scale, broad coverage grammar for the language. 

The Hebrew script uses a character set that differs from the one that appears on an ordinary 
keyboard. The script is highly ambiguous, as most of the vowels are not written; furthermore, 
many particles (prepositions, articles and conjunctions) are attached (in the script) to the words 
succeeding them. Since the problem of morphological analysis of Hebrew, even when represented 
in the Hebrew script, is practically solved, we have decided in this work to use a transcription of 
Hebrew, known as Phonemic Scrip^ ( prnan, 1986| ; prnan, 199^ ; prnan and Katz, 19951 ). First, 
it uses only symbols that appear on any computer keyboard; second, it is unambiguous, similarily 
to average European languages. 

We first list (section 11) some of the major HPSG schemata t hat serve to combine different 
kinds of phrases, along with their adaptation to our needs. Section 4.2 describes the structure of 
noun phrases, and we concentrate in section [4.3 on the status of the definite article in Hebrew. 



Section 4.4 briefly discusses noun-noun constructs. The complete grammar is listed in appendix H. 



4.1 Phrase Structure Schemata 

HPSG "rules" are organized as a set of principles that set constraints on the properties of well- 
formed phrases, along with a set of ID schemata that license certain phrase structures. The 
schemata are independent of the categories of the involved phrases; they state general conditions 
for the construction of larger phrases out of smaller ones, according to the function of the sub- 



phrases. In (Pollard and Sag, 1994) six schemata are listed; we have adopted four of them in our 
grammar. 



^This script was accepted as a standard number ISO-DIS 259-3. 
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ID schemata only license certain phrase combinations. They do not specify all the constraints 
imposed on the involved sub-phrases, as these are articulated by the principles. However, in a 
system that is based on phrase-structure rules (e.g., ALE) the principles and the schemata must 
be interleaved: each rule encodes not only the phrase structure, but also constraints imposed by 
the grammar principles. 

Consider, for example, the head-subject schema of HPSG, which states that a phrase with an 
empty subj list can be constructed by combining a (head) phrase, whose subj list is of length 1, with 
a (subject) phrase. Nothing in the schema relates the subject to the head; it is the subcategorization 
principle that requires that the subject be unifiable with the single element in the head's subj list. 
Furthermore, the head feature principle requires that the values of the head features in both the 



phrase itself and its head sub-phrase be identical. The first rule listed below (figure 4.1) combines 
these constraints: it states that a phrase can be constructed out of two sub-phrases, the subject 
and the head, where the first element (the value of the hd feature) in the suhj list of the head is 
token-identical to the subject (through the use of the Subj variable), and the head features of the 
phrase and its head are token- identical (through the use of the Head variable). 

Subject-Head schema Most importantly, this schema licenses the combination of a subject with 
a predicate to form a sentence. The properties of the subject are taken from the subj feature 



of the head daughter. The schema is listed in figure 4.1 



Head-Complement schema The rest of the complements, other than the subject, are combined 
with the head by the head-complement schema. Once again, the appropriate complements 
are determined by the head and are specified as the elements in the list comps, as shown in 



figure 4.2 



Head-Marker Schema Markers are used to guarantee that a certain modifier combines only 
once with a certain head. A typical example is quantifiers (such as 'every') modifying nouns. 
This schema is listed in figure |4.3| . 

Head-Adjunct schema Adjuncts can be combined with the heads they modify over and over 
again. In HPSG adjuncts select their heads - it is the adjunct that determines the features 
of the head it might be attached too, through the value of the feature mod, as depicted in 



figure 4.4 



4.2 The Structure of Noun Phrases in Hebrew 



A noun phrase (NP) is a phrase that is headed by a nounQ (N), optionally modified or complemented 
by various adjuncts. In this section we list the possible adjuncts and briefly discuss their character. 
A more thorough discussion of selected phenomena is provided in the next sections. More Hebrew 



data as well as further references can be found in (Ornan, 1964; Ornan, 1979; Glinert, 1989 



Wintner, 1991; Yizhar, 1993) 



A noun is a word whose head feature has the type noun and whose cont feature is of type 
nom_obj. The head feature of nouns carries an additional (boolean) feature, def, which is explained 
in section 4.3.2 below. Hebrew nouns are specified for gender, number and perso??^, and these three 
features are listed as part of the index feature of nouns. Figure depicts the lexical entry of the 
common noun sepr (book), where '()' represents an empty list and '{}' denote a set. 

^Elliptic NPs might not contain a noun, but we don't discuss ellipsis here. 
''Only pronouns are specified for person, other nouns are inherently third person. 
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% Schema 1 (ch. 9, p. 347) 

% Subject - Head 
subject_head rule 

(phrase , cat : ( cat , head : Head , sub j : e_list , comps : Comps , spr : Spr , marking : Marking) , 
cont : Cont , conx : backgr : BM , qstore : QM) 

cat> "/o subject 

(Subj , sign, cat: cat, cont :sem_obj , conx: backgr :BS, qstore :QS) , 
cat> 7„ head 

(sign, cat: (cat, head: Head, subj : (hd:Subj ,tl:e_list) , 

comps: (Comps , e_list) , spr : Spr, marking: Marking) , 
cont : (Cont , sem_ob j ) , conx : backgr : BH , qstore : QH) , 
goal> union(C!S,QH,QM) , 
goal> union (BS, BH, BM) . 

Figure 4.1: Subject-Head schema 

7. Schema 2 (ch. 9, p. 348) 
% Head - Complement 
head_complement rule 

(phrase, cat : (cat , head: Head, subj : Subj , comps : Comps, spr : Spr .marking: Marking) , 
cont : Cont , conx : backgr : BM , qstore : QM) 

cat> 7o head 

(sign, cat : (cat , head: Head, subj : Subj , 

comps: (hd:Comp,tl: Comps) , 

spr : Spr , marking : Marking) , 
cont : Cont , conx : backgr : BH , qstore : QH) , 
cat> 7o complement 

(Comp, sign, cat : cat , cont : sem_obj , conx : backgr : BC , qstore : QC) , 
goal> union (BH , BC , BM) , 
goal> union(QH,QC,QM) . 

Figure 4.2: Head-Complement schema 

Hebrew is a relatively free constituent order language. Still, the order of the NP elements is 
sometimes fixed. In particular, the adjuncts can be strictly classified as either pre-head or post- 
head. Within each category a default ordering exists, from which some deviations are allowed. In 
the discussion below the adjucnts are listed by this default ordering. 

Pre-head adjuncts 

Determiners This is a closed class of words such as koll (all/every), robb (most-of), kamma (some) 
etc. 

Cardinal numbers Such as $lo$a (three). Cardinals appear in two forms: the regular one and 
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% Schema 4 (ch. 1, p. 51) 
7. Marker - Head 
marker_head rule 

(phrase , cat : (cat , head: Head, sub j : Subj , comps :Comps , spr : Spr .marking: Marking) , 
cont : Cont , conx :backgr : BM,qstore : (elt :Elt , elts : Elts) ) 

cat> °/« marker 

(word, cat : (cat ,head: (mark, spec :HeadDtr) , 

subj : list, comps: list, spr: list, marking: (Marking, marked) ) , 

cont : (Elt , quant ,det : sem_det ,restind: sem_obj ) , 

conx :backgr : BD,qstore : e_set) , 
cat> y. head 

(HeadDtr , sign, cat : (cat , he ad : He ad, subj : Subj , comps : Comps , 
spr : Spr , marking : unmarked) , 
cont : Cont , conx :backgr :BH,qst ore :Elts) , 
goal> union (BD, BH, BM) . 



Figure 4.3: Head-Marker schema 

% Schema 5 (ch. 9, p. 403) 
% Head - Adjunct 
head_adjunct rule 

(phrase , cat : Cat , cont : Cont , conx :backgr :BM,qstore : QM) 
cat> "/o head 

(HeadDtr, sign, cat : Cat , cont : sem_obj , conx :backgr :BH,qstore : QH) , 
cat> "/o adjunct 

(sign, cat :head: (adj ,def ness : def ness , mod: HeadDtr) , 

cont : Cont , conx : backgr : BA , qstore : QA) , 
goal> union (BH, BA, BM) , 
goal> union(qH,QA,qM) . 



Figure 4.4: Head-Adjunct schema 



the 'nismak' form, discussed in section 4.3.3| . 
Definite article The definite article ha- is separated from the other determiners for reasons that 



are expHcated in section 4.3 



Post-head adjuncts 

Nominal complement Hebrew alfows a very elaborate system of nominal-nominal compounds. 
The first nominal might be a noun or, rarely, an adjective; it is the syntactic head of the com- 
pound, and it is morphologically marked. The second nominal can be any NP. Compounds 
are discussed in section 4.4. 
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word 

phon : 



cat 



cont : 



sepr 
cat 

head : 

subj : 
comps : 
spr : 

marking : 
nom_obj 



index : \2\ 



qstore : {} 



noun 

def: 





[marking] 



restr 



index 

per : 
num : 
gen : 
psoa 

nucleus 



third] 

sg] 



masc 



book 

instance : \2\ 



Figure 4.5: The lexical entry of the noun sepr 



Adjectives Hebrew adjectives are marked for number, gender and definiteness, on which they 
must agree with the head noun. 

Ordinal numbers Such as $eni (second). 

Possessives These include possessive pronouns such as Selli (mine) as well as phrases (Sell dan - 
Dan's). 

Prepositional phrases The rules that govern the combination of prepositional phrases to head 
nouns in Hebrew are very similar to those in English. 

Subcategorized complements Certain nouns subcatcgorize for particular complements. For 
example, verbal nouns such as racon (wish) permit an infinitival verb phrase as a complement. 
This is encoded in the list of complements (the value of camp) in the lexical entries of the 
nouns. 

Relative clauses Are not covered in our grammar. 

As mentioned above, a thourough and complete description of the linguistic data is outside 



the scope of this work. The reader is referred to, e.g., ( Ornan, 1964 ; Ornan, 1979 ; Glinert, 1989 
Yizhar, 199 j ) for more details. 



4.3 The Status of the Definite Article in Hebrew 
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4.3.1 The Data 



Hebrew marks definiteness in a way that differs a lot from English (but resembles other Semitic 
language, notably Arabic, and also modern Greek, as will be shown below). The definite article ha- 
in Hebrew attaches to words, not to phrases. It combines with various kinds of nominals: common 
nouns, a few proper nouns, adjectives, ordinal numbers, cardinal numbers and demonstratives. 
Moreover, definite noun phrases in Hebrew are polydefinite: most of the elements of the phrase are 
required to be explicitly definite, and there is a strict requirement that these elements agree on 
definiteness for the phrase to be grammatical. Hebrew does have indefinite articles ('exxad, 'axxat, 
'xadim), but their use is optional and not common. It is therefore useful to view bare nominals 
(with no attached definite article) by default as indefinite. See examples (P ~ (^) for some data. 

ha- sepr 

(1) the book (4) 
"the book" 



sepr ('exxad) 
book (one) 
"a book" 



ha- sepr ha- 

(2) the book the 
"the big book" 

ha- sepr ha- 

(3) the book the 
"the second book' 



gadol 

big (5) 
$eni 

second (6) 



sepr gadol ('exxad) 

book big (one) 
"a big book" 

sepr $eni 

book second 
"a second book" 



4.3.2 HPSG Approach 

HPSG (as formulated in ( [PoUard and Sag, 1994 )) uses two schemata to form simple noun phrases 



(NPs): the head-marker schema combines a determiner (DET) with a noun (N), and the head- 
adjunct schema combines any number of adjectives (ADJs) with an NP. Nouns subcategorize 
for DET in English: the lexical entry of a singular noun explicitly states an anticipation for 
a determiner. The combination of DET-N results in a full NP; the effect of the determiner is 
recorded in the semantics of the phrase as the value of the QSTORE feature propagates from the 
determiner to the mother. Adjectives 'select' the NP they modify in the sense that the NP is the 
contents of the MOD feature in the adjunct's lexical entry. The head-adjunct schema treats the 
NP as the head and the ADJ as the semantic head, so that the semantics of the phrase is inherited 
from the adjunct. 



The HPSG account of Pollard and Sag (1994) would not be appropriate for Hebrew due to 



the differences in the structure of NPs in the two languages. Most notably, Hebrew nouns do 
not subcategorize for determiners, for bare nouns qualify perfectly as complete NPs, as shown in 

examples (|l|) - (||) above. 

An alternative construction is the HPSG analysis of modern Greek NPs presented in ( |Kolliakou] 



1996). It appears that in Greek, too, the definite article can attach to various kinds of nominals, and 
the language exhibits both monadic definites and polydefinites. Thus, all three phrases in (^ - (|^) 
are grammatical:^ 

to kokino podilato 

(7) the red bike 
"the red bike" 



The Greek examples are taken from KoUiakou (1996) 
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to kenurio to kokino podilato 

(8) the new the red bike 
"the new red bike" 

ta dio ta podilata ta kokina 

(9) the two the bikes the red 
"the two red bikes" 



Kolhakou (i;?^) concludes that the Greek definite article is not a regular determiner, but 
constitutes a category of its own. It does not head the phrase it occurs in (as was suggested 
by (Netter, 1994) for Germanic languages); rather, it functions as an adjunct: it is optional, and it 
selects the head it modifies by specifying this head as the value of the MOD feature. Furthermore, 
the definite article marks the phrase it occurs in as definite. This is achieved by introducing a new 
feature of nominals, UNIQUE,^ whose (boolean) value is '+' iff the nominal is definite. Naturally, 
the value of this feature in the lexical entries of nominals is ' — ' (since they are indefinite by default). 



4.3.3 An Analysis of Hebrew Definites 

The analysis of [Kolhakou (199(: ) employs a non-quantificational approach to the semantics of 
definites. The UNIQUE feature is a semantic one (it is part of the CONTents of a phrase), and is 



the only indication of the definiteness of the phrase. This is in contrast to the approach of (Pollard 



and Sag, 1994) that is based on Cooper Storage of quantifiers. Whatever approach to semantics is 
taken, it is clear from examples (|l|) - (^) that agreement in definiteness among elements of the NP 
in Hebrew is a morpho-syntactic process, and we account here for this component of the grammar 
only. 

In contrast to Greek, Hebrew exhibits no cases of monadic definites, so all we have to account 
for is the case of polydefinites. A major observation here is that the definite article attaches only 
to words. Therefore, it seems reasonable to account for definite article combination by means of 
a lexical rule that creates a definite nomin al out of an in definite one. For a detailed discussion of 
the definite article in Modern Hebrew, see ( Ornan, 1964 ). 

The Definite Lexical Rule (DLR) operates on various kinds of nominals: nouns (e.g., sepr), 
adjectives (e.g., gadol), ordinals (e.g., $eni), demonstratives (e.g., ze) and cardinals (e.g., $lo$a). In 
all categories its effect on the phonology is that of prefixing it with ha-. To emphasize the fact that 
definiteness agreement in Hebrew is not a semantic process, we add a boolean feature DEF to the 
CATegory of nominals (rather than to their CONTent). The DLR changes the value of the path 
SYNSEM|LOC|CAT|DEF from to '+'. When the DLR operates on adjuncts, it additionally 
changes the value of the path MOD|LOC|CAT|DEF in the same manner. Thus it is guaranteed 
that definite adjectives, for example, are not only specified as definite but also select definite heads. 



The effect of the DLR when applied to a few nominals is exemplified in figure 4.6 



Once the process of adding the definite article is taking place in the lexicon, the head-adjunct 
schema can remain intact. Moreover, the agreement in definiteness between a nominal and its 
adjuncts is stated in the lexical entry of the adjuncts, just like agreement on number and gender 
is. 

Cardinals introduce an irregularity to the analysis of definites. As mentioned above, cardinals 
can combine with the definite article in Hebrew. However, such constructs are used only in elliptic 
phrases. In full noun phrases, when the head noun is present, the definiteness agreement between 



^The specification of uniqueness has a semantic contribution in addition to its syntactic marking, but we suppress 
a complete discussion of semantics here. 
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"word 

phon : 

cat : 



cont : 



"word 

phon : 



cat 



cont 



sepr 
cat 

head : 
subcat : 
def: 

nom_obj 



l^nounj 





index : 



restr : 



gadol 
cat 

head : 

subcat : 
Idef: 
nom_obj 



m 



"index 




per : 


3rd] 


num : 


sg] 


_gen : 


m 


book 




inst : m 


} 



adj 

mod : l^synsemj 



m- 



index : 



restr : 



index 

per : 
num : 
gen : 
big 

inst : nn 



3rd 

sg 



word 

phon : 

cat : 



cont : 



word 

phon : 



cat 



cont 



ha-sepr 
cat 

head : 
subcat : 
def : 

nom_obj 

index 



l^nounj 





m 



"index 

per : 
num : 



restr : 



ha-gadol 
cat 

head : 



^gen 
r [book 1 1 
\ [inst : [T]J J 



3rdJ 

sg] 



subcat : 
.def : 
nom_obj 



index 



restr : 



adj 



mod : [synsemj 








Figure 4.6: The effect of the Definite Lexical Rule 



the head noun and the cardinal number is realized in a unique way: a definite noun does not 
combine with a definite cardinal, but rather with construct form of the cardinal, 'nismak'. The 
absolute form of many other nominals have 'nismak' forms that are used in noun-noun constructs 
(see section ^^). However, cardinals in this form are implicitly definite, as they combine only with 
definite NPs^pThe data are summarized in examples (|lo| ) to (|lj) below. 

$lo$a sparim 

three books 
"three books" 

?$lo$a ha- sparim 

three the books 
"?the three books" 

$lo$t ha- sparim 

threc-'nismak' the books 
"the three books" 

'^This rule has a few exceptions; the cardinal $nei (two-'nismak') is combined with both definite and idefinite 
nouns; and there are few indefinite nouns (such as me'ot (hundreds) or 'lapim (thousands)) that require 'nismak' 
cardinals. The phrases preceded by '?' are marked, archaic forms. 



(10) 

(11) 

(12) 
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?$lo$t sparim 

(13) three-'nismak' books 
"?three books" 

*ha- $lo$a ha- sparim 

(14) the three the books 
"?the three books" 

Notice that (^) is ungrammatical because the correct way of marking definiteness of cardinal in 
Hebrew is by using the 'nismak' form, and not because the phrase ha-$lo$a is ungrammaticaL 
Indeed, this last phrase can be used in elliptical structures such as (p^: 

qaniti $lo$a sparim. koll ha- $lo$a b-'anglit 

(15) I-bought three books. All the three in-English 
"I bought three books. All three of them are in English" 

The 'nismak' form of nominals is an inflection of their regular form, and therefore is obtained 
as the outcome of a lexical rule. As far as definiteness is concerned, when the DLR operates on an 
indefinite cardinal, its output is the 'nismak' form rather than the regular combination of ha- and 
the cardinal. All other details remain the same. The definite ha- is combined with cardinals by a 
different mechanism that is not discussed here. 



4.4 Noun-Noun Constructs 

Noun-noun compounds are constructed in a special way in Hebrew: the head noun, which appears 
first in the compound, has a marked morphological form[^- 'nismak'. Most NPs can serve as the 
adjunct of a compound. Syntactically, the compound inherits all the features of the 'nismak', with 
the exception of definiteness, which is inherited from the NP complement. Consider the following 
examples: 

pirxei gann yapim 

(16) fiowers-pl-'nismak' garden-sg beautiful-pl 
"beautiful garden flowers" 

pirxei ha- gann ha- yapim 

(17) fiowers-pl-'nismak' the garden-sg the beautiful-pl 
"the beautiful garden fiowers" 

In both examples the entire phrase is in plural, as can be seen from the adjective, because the 
head noun pirxei is in plural. However, ( pT| ) is definite while ( |l^ ) isn't, due to the definite article 
modifying the complement gann. 

The process of compounding is recursive, as the resulting compound is a legitimate NP for 
combining with some other 'nismak' form. When more than two nouns are combined, the result- 
ing phrase might be (if the nouns have the same gender and number) syntactically ambiguous: 
example ( p^ ) can be translated as "my fat aunt's cow" or as "my aunt's fat cow". 

^For many nouns in Hebrew, especially among singular masculine and plural feminine, this form is identical to 
the regular form. 
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parat dodati ha- Smena 

(18) cow-'nismak' my-aunt the fat-f 

"my fat aunt's cow / my aunt's fat cow" 

The 'nismak' form is generated from the regular noun form by means of the 'nismak' lexical 
rule (NLR). Apart from modifying the phonology of the noun, this rule has a double effect. First, 
it adds a subcategorized NP complement to the COMP list of the noun, to express the expectation 
for an NP complement; second, it unifies the values of the DEF feature of the noun and its newly 
added complement. Whe n th e noun is complemented, the resulting phrase inherits the definiteness 



from the adjunct. Figure 4.7 depicts the effect of applying the NLR to the noun praxim (flowers) 



word 

phon : 

cat : 



praxim 
cat 

head : [noun 
subcat : {) 
def: 
nom_obj 

index 



cont : 



index : [T] 



restr 



per : 
num : 
_gen : 
flower 

inst : 



3rd 
pl 



m 



NLR 



word 

phon : 



cat : 



pirxei 
cat 

head : 

subcat : 
def: 

nom_obj 



cont 



[noun] 
synsem 

loc : cat : head 
loc : cat : def : 
\2\ [boolean] 



[nominal] 

m 



index : \T\ 



restr 



index 

per : 
num : 
gen : 
flower 
inst : [T] 



3rd 
pl 



Figure 4.7: The effect of the 'nismak' lexical rule 



Notice that the lexical entry of the 'nismak' noun pirxei, listed in figure 4.7, does not specify 
any value for the DEF feature. Hence, the DLR cannot be applied to pirxei, as it only applies for 
nominals that are specified as DEF — . This corresponds to the observation that 'nismak' nouns 
cannot be modified by the definite article in Hebrew. Once the 'nismak' lexical rule is applied to 
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'nismak'-form nouns, their lexical entry specifies that they subcategorize for a nominal complement. 
Noun- noun compounds can thus be constructed by the head-complement schema (figure 4.2). 
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Chapter 5 

Conclusion 



As linguistic formalisms become more rigorous, the necessity of well defined semantics for grammars 
increases. We presented an operational semantics for TFS-based formalisms, making use of an 
abstract machine specifically tailored for this kind of applications. In addition, we described a 
compiler for a general TFS-based language. The compiled code, in terms of abstract machine 
instructions, can be interpreted and executed on ordinary hardware. 

We have formalized in this thesis the concepts of grammars and languages for linguistic for- 
malisms that are based on typed feature structures, using the notion of multi-rooted structures 
that generalize feature structures. We use multi-rooted structures for representing grammar rules 
as well as (the equivalent of) sentential forms that are generated during parsing. We described 
a computational process that corresponds to parsing with respect to such formalisms. We thus 
achieved two different specifications, namely a declarative (derivation-based) one and an algebraic 
(computation-based) one, for the semantics of those formalisms. Next, we have proved that the 
two specifications coincide, namely that the computational process induced by the algebraic speci- 
fication is correct with respect to the declarative specification. Finally, we formally characterized a 
subset of the grammars, off-line parsable ones, for which termination of parsing can be guaranteed. 
Making use of the wcU-foundcdncss of the subsumption relation, we proved that for every grammar 
in this class, parsing is finitely terminating. 

This view of parsing with typed feature structures is the basis for the design of AAiACTA, 
an abstract machine specifically tailored for executing code that is compiled from grammars. Wc 
detailed the architecture of the machine, its data structures and instruction set, along with the 
process of compilation of ALE grammars. The use of abstract machine techniques results in 
highly efficient processing. The system was implemented and a comparison to ALE shows a great 
improvement in both compilation and execution times. 

The current implementation of AMAjCTA is not fully compatible with ALE. Several features 
of ALE are missing in our implementation, and therefore a natural extension of this project would 
be to add them. Most notably, AA4ACTA doesn't support the use of lexical rules, which are 
considered important for any reasonable grammar of natural languages. ALE also includes a 
component of definite clauses over TFSs which is missing in AM.ACIA - an interesting extension 
would be to link the abstract machine with a WAM-like machine that can handle definite clauses. 

AJAACIA'a current compiler is relatively basic, and several optimizations might be introduced 
to it in the future. A major optimization might be achieved by incorporating static (compile-timc) 
analysis of grammars. Several interesting questions, relating grammars to computer programs. 
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arise: for example, what is the equivalent of dead-code elimination? How can concepts of structured 
programming be transferred to grammars? Can modules be defined for grammars, too? 

A different line of improvements concerns the parsing algorithm incorporated by AMAjCTA. 
Currently only one, relatively simple, algorithm is inherent to the machine. An interesting ex- 
tension would be to implement various algorithms, probably with user control over them, and to 
experiment their time and space efficiency. 

AA4ACTA is currently being used as a platform for developing an HPSG grammar for the 
Hebrew language. While this endeavor is still underway, it serves as a realistic use of the sys- 
tem. The development of the grammar already resulted in many improvements and extensions 
to AA4ACIA, and the system proved stable and reliable enough to support it. We presented a 
partial HPSG-based grammar for a fragment of Hebrew, concentrating on noun phrases. We hope 
that this endeavor will serve as the basis for a more comprehensive, broad-coverage grammar of 
Hebrew. 
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Appendix A 

List of Machine Instructions 



The following table lists, for quick reference, the machine instructions and functions, accompanied 
by a reference to the page in the text in which they are described. 



Query processing 
put_node t/n, Xi 
put_arc Xi , offset ,Xj 
proceed Xi 

Type unification 
build_str t 
build_ref _and_unif y i 
build_ref i 
build_self _ref i 
build_var t 
unify_feat i 

Program processing 
get_structure t/n,Xi 
unif y_variable Xi 
unify_value Xi 
put_rule L 



44 
44 
53 



47 
47 
47 
47 
47 
47 



4S 
4g 

5a 



Control 

f irst_key 
next_key 
checkJcey 
tst_active_edges I 
next_active_edge I 
tst_complete_edges /' 
next_complete_edge I 
call 

loadJs r 
copy_active_edge I 
copy_complete_edge X 

Auxiliary functions 

bind(addrl,addr2) 

deref (a) : address 

unif y (addr 1 , addr2) : boolean 

failO 



56 
47 
49 
59 
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Appendix B 

The Hebrew Grammar 



0/ 0/ 0/ 0/ 0/ 0/ 0/ 0/ 0/ 0/ 0/ 0/ 0/ 0/ 0/ 0/ 0/ 0/ 0/ oy oy o/ o/ o/ o/ o/o/ o/ o/ ey ey y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y 
U U U U U U U U U U U U U U U U /o /o /o U U /o /o /o U U /o /o /o /o /o /o /o /o /o /o /o /o /o /o /o /o /o /o /o /o /o /o /o /o /o /o /o /o /o /o /o /o /o /o /o /o /o /o /o /o /o 

7. 

% File: Hebrew grammar 

7. 

7« Includes : 1 . Schema 1 



7. 2. Schema 2 

7. 3. Schema 4 

7, 4. Schema 5 

7. 5. 

7o 6. Head feature principle 

7« 7. Valence principle (subj , comps, spr) 

7» 8. Semantics principle (cont) - partial 

7o 9. Contextual Consistency (conx) - approximate 

7o 10. Quantifier storage - preliminary 

7. 
% 



°/o°/o°/o********************** Type Hierarchy 

y.th 

bot sub [sign, list, 

set , cat , sem_ob j , sem_det , conx , qf psoa , index , per , num , gend , 
head , vf orm , pf orm , def ness , marking , boolean] . 

sign sub [word, phrase] intro [cat : cat , cont : sem_obj , conx : conx, qstore : set_quant] . 
word sub [] . 
phrase sub [] . 

cat sub [] intro [head : head, subj : list, comps: list, spr: list, marking: marking] . 
sem_obj sub [psoa, nom_ob j , quant] . 
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noni_obj sub [pron.npro] intro [index: index, restr: set _psoa] . 

pron sub [] . 

npro sub [] . 
psoa sub [] intro [nucleus : qfpsoa] . 
qucint sub [] intro [det : sem_det , restind:npro] . 

sem_det sub [forall, exists, the] . 
for all sub [] . 
exists sub [] . 
the sub [] . 

conx sub [] intro [backgr : set_psoa] . 

qfpsoa sub [un_relation, cn, naming] . 

un_relation sub [walk, sing, red, big, bin_relation] intro [agent : index] . 
walk sub [] . 
sing sub [] . 
red sub [] . 
big sub [] . 

bin_relation sub [see, eat, tri_relation] intro [theme : index] . 
see sub [] . 

eat sub [] . 

tri_relation sub [sell, give] intro [patient : index] . 

sell sub [] . 

give sub [] . 
cn sub [book, apple] intro [instance : index] . 

book sub [] . 

apple sub [] . 
naming sub [dan,dana] intro [bearer : index] . 

dan sub [] . 

dana sub [] . 

index sub [] intro [per : per , num : num , gend : gend] . 
per sub [f irst , second, third] . 

first sub [] . 

second sub [] . 

third sub [] . 
num sub [sg,pl] . 

sg sub [] . 

pi sub [] . 
gend sub [masc,fem] . 

masc sub [] . 

fern sub [] . 

head sub [subst,func] . 

func sub [mark] intro [spec: sign], 
mark sub [det] . 
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det sub [] . 
subst sub [nominal , verb, prep] . 

nominal sub [noun , ad j , numeral] intro [defness :defness] . 
noun sub [] . 

adj sub [] intro [mod: sign], 
numeral sub [] . 
prep sub [] intro [pf orm:pf orm] . 
verb sub [] intro [vf orm: vf orm] . 
7, I decided not to add a 'mod' feature to all substantials, since in 
% most of the cases (excluding adjectives) its value is 'none'. 

defness sub [indef.def]. 
indef sub [] . 
def sub [] . 

vform sub [f in,bse] . 
fin sub [] . 
bse sub [] . 

pform sub [to, in] . 
to sub [] . 
in sub [] . 

marking sub [marked , unmarked] . 

marked sub [comp, determiner , quantifier] . 

comp sub [] . 

determiner sub [] . 

quantifier sub [] . 
unmarked sub [] . 

boolean sub [yes, no], 
yes sub [] . 
no sub [] . 

list sub [e_list ,ne_list] . 

ne_list sub [] intro [hd:bot ,tl : list] . 
e_list sub [] . 

set sub [e_set,ne_set,set_psoa,set_quant] . 
e_set sub [] . 

ne_set sub [ne_set_psoa,ne_set_qucint] intro [elt:bot,elts:set] . 
set_psoa sub [e_set, ne_set_psoa] . 

ne_set_psoa sub [] . 7, intro [elt:psoa, elts : set_psoa] . 
set_quant sub [e_set, ne_set_quant] . 

ne_set_quant sub [] . 7o intro [elt: quant, elts : set_quant] . 

7o7o7o********************** Macros 
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"/.macros 



propn(Num, Gen, Name) macro 

(word, cat : (cat , head: noun, subj : e_list , comps : e_list , spr : e_list) , 

cont: (npro, index: ( Ind, per: third, num:Num,gend: Gen) ,restr:elt:Sem) , 
conx:backgr: (ne_set_psoa,elt : (Sem, psoa, nucleus : (Name, bearer: Ind)) , 

elts:e_set)) . 

np (Per , Mum , Gen , Index) macro 
(sign, cat : head : noun , 

cont: (nom_obj , index : ( Index, per: Per, num:Num,gend: Gen))) . 

noun (Num, Gen, Sem, Def) macro 

(word, cat: (head: (noun,def ness : Def ) ,subj : e_list , comps : e_list , spr : e_list) , 
cont: (nom_obj , index: (Ind,per :third,num:Num,gend:Gen) , 

restr : (elt: (psoa, nucleus : (Sem, instance : Ind) ) ,elts:e_set)) , 
qstore : e_set) . 

intrans macro (cat : comps : e_list) . 

trsins macro 

(cat : comps : (hd: (<3 np (Per , Num, Gen, Theme) ) ,tl : e_list) , 
cont : nucleus : theme : Theme) . 

ditrans macro 

(cat : comps : (hd: (0 np (Per , Num, Gen, Theme) ) , 

tl:hd: (@ np(Perl,Numl,Genl, Patient)) , 
tl:tl:e_list) , 
cont : nucleus : patient : Patient) . 

verb (Per , Num, Gen, Sem) macro 

(word, cat: (cat, head: (verb,vf orm:f in) , 

subj :hd: (@ np (Per, Num, Gen, Subj Ind) ) , 
marking : unmarked) , 
cont :nucleus : (Sem, agent: Subj Ind) , 
conx : backgr : e_set , 
qstore :e_set) . 

nominal (Def , Ind) macro 

(sign, cat : (head: (nominal, def ness :Def ) , 

subj : e_list , comps : e_list , spr : e_list , marking : unmarked) , 
cont : (nom_ob j , index : Ind) ) . 

adj (Num, Gen, Sem, Def ) macro 

(word , cat : (head : (adj , def ness : Def , 

mod : (@ nominal (Def , Ind) ) ) , 
subj : e_list , comps : e_list , spr : e_list) , 
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cont : (nom_obj , index: (Ind,nmn:Num,gend:Gen) , 

restr: (elt: (psoa, nucleus : Sem, nucleus : agent :ModInd) ) ) , 
qstore:e_set) . 

***** Empty Categories 

%%%********************** Grammmar Rules 
'/.grammar 

7, Schema 1 (ch. 9, p. 347) 
% Subject - Head 
subject_head rule 

(phrase, cat: (cat, head: Head, 
subj : e_list , 
comps : Comps , 

spr : Spr , marking : Marking) , 
cont : Cont , 
conx : backgr : BM , 
qstore : QM) 

cat> 7o subject 
(Subj , sign, cat : cat , 

cont : sem_obj , 

conx : backgr : BS , 

qstore : QS) , 
cat> 7, head 

(sign, cat: (cat, head: Head, 

subj : (hd: Subj ,tl : e_list) , 

comps: (Comps, e_list) , 7. comps is required to be empty so that 
7« subject is added after all the complements, 
spr : Spr , marking : Marking) , 
cont : (Cont , sem_obj ) , 
conx : backgr : BH , 
qstore : QH) , 
goal> union(QS,QH,QM) , 
goal> union (BS, BH, BM) . 

7. Schema 2 (ch. 9, p. 348) 

7o Head - Complement 
head_complement rule 

(phrase, cat: (cat , head: Head, 
subj : Subj , 
comps : Comps , 

spr : Spr , marking : Marking) , 
cont : Cont , 
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conx : backgr : BM , 
qstore : QM) 

cat> 

(sign, cat: (cat, head:Head., % head 
sub j : Sub j , 

comps: (hd: (Coinp,sign,cat:cat, 

cont : sem_obj , 
conx : backgr :BC, 
qstore :QC) , 

tl : Comps) , 
spr : Spr , marking : Maxking) , 
cont : Cont , 
conx : backgr : BH , 
qstore : QH) , 

cat> 

Comp, 7, complement 
goal> union (BH, BC, BM) , 
goal> union(QH,QC,QM) . 

% Schema 4 (ch. 1, p. 51) 
7, Marker - Head 
marker_head rule 

(phrase, cat: (cat, head: Head, 
sub j : Sub j , 

comps : Comps , 

spr : Spr , marking : Meirking) , 
cont : Cont , 
conx : backgr : BM , 
qstore : (elt : Elt , elts : Elts) ) 

cat> 7o marker 

(word, cat: (cat, head: (mark, spec :HeadDtr) , 

subj :list, comps:list, spr :list, marking: (Marking, marked) ) , 

cont : (Elt , quant , det : sem_det ,restind: sem_obj ) , 

conx : backgr : BD , 

qstore :e_set) , 
cat> 7o head 

(HeadDtr, sign, cat : (cat , 

head: Head, 
subj : Subj , 
comps : Comps , 

spr : Spr , marking : unmarked) , 

cont : Cont , 
conx : backgr : BH , 
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qstore:Elts) , 

goal> union (BD.BH.BM) 



empty 

(sign, cat: (head: (det.spec: (sign, cat: (he ad: noun, sub j : e_list , comps : e_list , 

spr : e_list) , 
cont : (Npro , index : (per : third , num : sg) ) , 
qstore:e_set)) , 
sub j : e_list , comps : e_list , spr : e_list , marking : quantifier) , 
cont: (qucint.det: exists, restind: Npro) , 
conx : backgr : e_set , 
qstore:e_set) . 

7. Schema 5 (ch. 9, p. 403) 

7. Head - Adjunct 

7. 

7o modification: the marking feature is shared by the adjunct and the 
% head (to require definiteness agreement) 
head_adjunct rule 

(phrase , cat : Cat , 
cont : Cont , 
conx : backgr : BM , 
qstore : QM) 

cat> 7« head 

(HeadDtr , sign , cat : Cat , 

cont : sem_obj , 

conx : backgr : BH , 

qstore : QH) , 
cat> 7» adjunct 

(sign, cat: head: (adj ,defness:defness, mod: HeadDtr) , 

cont : Cont , 

conx : backgr : BA , 

qstore :C1A) , 
goal> union (BH, BA, BM) , 
goal> union(QH,QA,QM) . 

7o7oyo********************** Lexical Entries 
yolexicon 

dan > 

(Q propn(sg,masc,dan)) . 
dana > 

(Q propn(sg,f em,dana)) . 
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sepr > 

(® nouii(sg,masc jbook, indef ) ) . 
ha- sepr > 

(® noun(sg,masc ,book,def ) ) . 
sparim > 

(@ noun (pi, masc, book, indef )) . 
$ar > 

(@ verbCthird, sg, masc, sing) , ((§ intrans)). 
$ara > 

(0 verbCthird, sg, fern, sing) , (@ intrans)). 
"akal > 

(@ verbCthird, sg, masc, eat) , (@ trans)), 
natein > 

(@ verbCthird, sg, masc, give) , C@ ditrans)). 
"adomm > 

C® adj Csg,masc,red,indef )) . 
gadol > 

C® adj Csg, masc, big,indef)) . 

ha-gadol > 

C® adj Csg, masc, big, def)) . 

gdolim > 

C® adj Cpl, masc, big,indef)) . 
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